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1.1 


1.  Introduction 

If  A  ■  («y)  is  an  a  x 
than  the  Matrix  product  C  ■ 

cik 

for  1  <  1  <  m,  1  <  k  <  p. 

Matrix  Multiplication  and  lta  special  oaaoa  occur  rosy  frequently 
In  nuMtrleal  analysis,  for  exaaplei  the  inner-product  of  two  factors 
(the  case  a  ■  p  ■  1),  Matrix  tlaaa  rector  Multiplication  (the  ceae  p  •  l), 
heck  substitution  when  solving  linear  systems,  iterative  reflnenant  (per¬ 
haps  with  several  right  hand  sides  at  once),  the  power  Method  for  eigen¬ 
values,  In  leest  squares  problems,  and  aany  nore.  H»nee,  It  Is  Interesting 
to  Investigate  algorithm  for  Matrix  aultlpli  cation,  end  in  particular  to 
see  in  what  circumstances  It  Is  posslbls  to  do  better  than  the  straight¬ 
forward  laplsnentatlon  of  the  definition  (l.Cl). 

It  la  clear  that  advantage  nay  often  be  taken  of  special  properties 
of  A,  B  or  C,  e.g.  sparseness  or  symretry,  if  such  properties  are  known 
e  priori.  He  shall  only  consider  the  general  ease  where  no  such  helpful 
properties  are  known,  for  practical  applications,  we  need  only  consider 
Matrices  over  the  rational,  real  and  conplex  fields,  although  the  definition 
above  Makes  sense  for  Matrices  over  any  ring.  The  algorithm  described  will 
all  be  applicable  to  the  problen  of  Multiplication  of  Matrices  over  an 
aitltrery  ccmutative  ring,  and  It  will  later  be  important  that,  for  sons 
of  the  algorithm,  the  ring  need  not  even  be  connutatlve. 
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n  Matrix,  and  B  ■  (b^)  Is  an  n  x  p  Matrix, 
A.B  Is  the  m  x  p  Matrix  (o^)  definsd  by 

*  k  vv 


(1.01) 


V 


1.2 


If  the  algorithm*  are  to  bo  1^1  ament  art  on  o  digital  computer, 
thon  simply  counting arithmetic  opont&ona  con  bo  rothor  misleading, 
for  loads,  atoroa  and  oddroaa  computation*  ore  oloo  important.  The 
boot  teat  la  to  l^loment  the  algorithm*  and  aeo  how  fbat  they  actually 
run,  and  oven  then  the  eonclualon  nay  depend  on  the  prograamr,  compiler 
and  machine  used.  Also,  from  a  practical  point  of  view,  a  tore  ga  re- 
qulreaenta  and  roundoff  error  a  nay  be  vitally  important.  Hence,  after 
deeerlblnf  aeveral  different  algorithms  in  Bee.  2,  X  ahell  dlacuaa 
their  maacrleol  propertlea  in  Boo.  3,  and  deacri'ta  aone  expo ri non tal 
reeulta  in  Sec.  it.  In  Beetlona  $  and  6  an  attenpt  to  find  acne  new 
algorithms  la  described,  and  in  Sec.  7  the  results  are  summarized  and 
some  concluaiona  dram.  The  notation  of  the  definition  (1.01)  will  be 
used  in  Cecs.  2  to  it. 
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2.1 


8 

2.  foown  Wesults 

2.1  The  Mownl  Htthod 

TO  evaluate  the  inner-produet  la  the  definition  (fl.ci)  takes  n 
Multiplications  end  n  -  1  additions.  Hence,  the  a.p  elements  can 
he  found  In  nop  Multiplications  and  win  -  l)p  additions,  and  about  the 
sane  number  of  loads,  stores  and  address  computations. 

If  vs  count  only  Multiplications  then  this  straightforward  Method 
is  known  to  be  optlnal  in  sons  important  special  oases.  If  c  •  p  »  1 
then  we  bare  the  case  of  a  rector  inner-produet,  and  a  alnple  dimensionality 
argument  shows  that,  in  flenersl,  n  Multiplications  are  necessary.  If  p  ■  1 
then  we  bare  the  case  of  Matrix  tines  rector  Multiplication,  and  an  Mul¬ 
tiplications  are  necessary  in  general  (Vinogred,  see  fl]).  In  the  general 
ease,  however,  less  then  anp  Multiplications  are  necessary:  Strassen's 
Method  shows  this  even  when  n  ■  n  •  p  »  2.  Dimensionality  arguments  give 
the  lower  bound  aax  (an,  np,  pa),  but  usually  tills  is  too  lew,  and  the 
best  possible  retmlt  is  not  known.  For  acre  details,  see  Sees.  5  end  6. 

2.2  Wlnoared**  Method 

Vinogred  [7]  has  given  s  Method  based  m  the  following  identity: 

8.J»/2|  fc»/2| 

j“x  aldV  "  (,i,2j-l 4  b2J,k)(ai.2J  4  hgj-l,^ 

fc/2|  W* 

^1  b£J-l.kb2J,k 

(2  .21) 

Here  |*J  moans  the  greatest  Integer  y  <  x,  and  analogously  Txl  means 
the  least  integer  y  %  x  • 
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8.2 


If  n  &•  mo,  tbo  loft  iid«  of  (2*81)  la  Just  o^i  but  If  n  la  odd, 
tha  tom  •iBb|lk  ■not  bo  oddod  to  give  c^.  Tbo  point  of  Mlnogrod's  method 
la  that  tbo  loot  too  tuna  In  (2.21)  can  bo  pmooaputod  and,  oneo  thlo  boo 
boon  done,  noddy  half  tbo  uaual  nodor  of  multiplications  ora  required 
to  ooaputo  oaeb  using  (2*21)  • 

Supposing  for  al^llelty  that  n  la  ono,  lot  ua  oolculato  tbo  abbor 
of  multiplications  and  addition  a  Involved  la  tbo  ooaputo  tlon  of  C  bjr 
Mlnogrod's  aotbod.  Mo  shall  never  dlatlngulsb  botooon  additions  oad  abb* 
traotlona.  To  ooaputo  n/2 

*1  ■  Z  *i,¥j-l*l,2J 

M 

requires  n/2  aultlplleatlons  and  (n/2  -  1)  additions,  and  similarly  for 

n/2 

'»  *  j"j  *2J-l,kb2J,k  .  (2.9) 

Hence,  to  proccaputo  x^,  ...  ,  *B  and  J fg, ...  ,  jrp  takas  (a  ♦  p)n/2 

aultlplloatlooa  and  (a  ♦  p)(n/2  -  l)  additions* 

divan  and  jr^,  to  ooaputo  eik  using  (2.21)  takas  n/2  aultlplloatlooa 
and  (3n/2  ♦  l)  additions.  Ibua  tbs  eoaputation  of  tbo  ontlro  astrlx  pro¬ 
duct  C  takas  ♦  a  ♦  p)n/2  aultlplleatlons  and  (Jnp  ♦»♦  p)n/2  ♦  ap  •  a  •  p 
additions.  Froa  8a e.  2.1,  no  bavo  savod  (ap  -  a  -  p)n/2  aultlpllootlona  at 
tbo  oxpanao  of  (9  ♦  a  ♦  p)n/2  ♦  2ap  -  a  -  p  additions,  in  coaparlaon  with 
tbo  noraal  aotbod. 

81nco  np-a-p«(a-  l)(p  -  1)  -  1,  ttaoro  Is  no  gsln  at  all  If 
1  •  1  or  p  ■  1,  so  tbo  roaarks  abovo  on  tbo  ainlaal  nuabor  of  aultlplleatlons 
required  for  Matrix  tines  vector  multiplication  are  not  contradicted. 

k 
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2.3 


Supposing  for  slspllclty  tint  n  ■  n  ■  p  >  1,  Wlnogmd'a  Method  saves 

(n  -  2)n  /2  Multiplications,  at  tba  expense  of  (n  ♦  6n  •  b)n/2  addition!. 

Hence,  tbara  la  a  saving  la  tba  number  of  nultipllcotlone  if  n  >  V  (recall 

that  we  a •  tuned  that  n  me  even,  but  it  nay  easily  bo  verified  that  there 

la  no  saving  for  n  ■  1  or  3)  •  If  n  is  large  then  about  r? /2  Multiplications 

bavo  boon  traded  for  additions.  If  a  Multiplication  takes  w  tlees  as  long 

as  an  addition,  wo  too  that  blnoarad  tine  m  w  »  3  +  ^n-lj 

Voroal  tiao  2(wfl) 

ao  tba  noat  wo  oan  expect  la  a  gain  of  nearly  5oi  if  w  end  n  are  largo. 

Since  (2.2b)  neglects  loads,  stores  etc.  the  pin  will  probably  bo  rather 
loss  than  this.  Typically  wo  Might  bavo  w  ■  2  (say  reel  Multiplicetlon) 
or  w  m  b  (say  coaplex  Hultlplioatlon),  giving  savings  of  up  to  175  end 
305  respectively.  In  Sec.  b  wo  shell  discuss  bow  large  n  has  to  be  for 
any  pin  in  practice,  and  the  Important  question  of  roundoff  error  will 
be  discussed  in  Sec.  3. 


(2.2b) 


2,3  Straaaen'a  H>thod 


2.k 


Suppose  there  la  an  algorithm  for  the  nultiplicetlon  of  nf  x  nQ 
matrices,  for  a  certain  fixed  Dq  >  1,  taking  M  multiplications  and  A 
additions.  Suppoae  further  that  thia  algorithm  la  applicable  for  ma- 
t  rice  a  over  an  aibltrary  ring.  In  particular,  we  are  not  allowed  to 
aaaume  the  commutative  lav  for  multiplication,  so,  for  example,  Wlnograd'a 
method  la  excluded. 


Let  v(k)  and  v(k)  be  the  number  of  multiplications  end  additions, 
Aspect  lvely,  required  to  multiply  n£  x  n£  matrices,  for  k  ■  0,  1,  2  ...  • 
We  have  v<0)  ■  1,  w(0)  -  0, 

v(l)  <  N,  w(l)  <  A. 


•  a  • 


i. 

with  elements  in  the  (nonconmutatlve)  ring  of  n0  x  nQ  matrices,  so  our 

algorithm  is  applicable.  Applying  it  will  take  M  multiplications,  and  A 
k  k 

additions,  of  Hq  x  nQ  matrices. 


Hence 

and 


v(k  +  1)  <  M.v(k) 


v(k  +  1)  <  M.w(k)  +  A.n0 


2k 


31) 


uxi  UlI  O 

Now  consider  x  ^  matrices  partitioned  into  blocks,  each 
k  k 

block  an  n^  x  Dq  matrix.  Our  matrices  may  be  regarded  as  nQ  x  nQ  matrices 


^(2.32) 


and 


From  (2.31)  end  (2.32)  it  follows  by  induction  on  k  that 
v(k)  <  M* 

»0i)  <  -*-s  (M*  -  n?11) 

(M-n*)  0 


} 


(2.55) 


2  2 

for  any  k  >  0  (provided  that  M  j  nQ,  but  M  <  n0  is  impossible  for 


nQ  >  1  anyway). 


6 


2.5 


Row,  in  order  to  multiply  n  x  n  matrices  for  any  n  >  1,  Just  take 

Ip  U 

k  ■  flogg  nl  and  eribed  the  n  x  n  metrical  in  np  x  metrical  with  the 

lait  n*  -  n  rowi  and  columni  zero,  and  uae  the  above  method.  From  (2.55), 
the  number  of  arithmetic  operation!  required  ii 


C(Mlogn0n)  •  O(nlo*n  aa  n*  •  . 

For  example,  the  normal  method  with  any  n^  >  1  hai  M  »  n^,  lo^  M  ■  5, 
giving  o(nr)  operations,  which  ia  no  lurpriae. 


From  (2.5*0*  iquare  matrix  multiplication  can  be  done  in  O(nP) 
operations,  where  0  ■  log^  M  ■  (log  M)/(log  nc)  .  (It  is  interesting 
to  note  that  0  is  independent  of  A.)  Clearly  there  is  a  constant 

0O  -  inf  (0  |  C(nP)  operations  suffice]  • 


The  normal  method,  and  Uinograd1  •  method,  both  show  that  0Q  <  5,  while 
the  results  discussed  in  Sec.  2.1  show  that  0Q  >  2.  The  actual  value  of 
0q  is  not  known.  While  it  might  be  considered  "intuitively  obvious"  that 
00  a  5,  this  is  false:  as  Strasaen  [5]  has  shown, 

80<1°82<^2-8  • 


Strassen's  idea  is  to  give  an  algorithm  for  the  multiplication  of  2  x  2 
matrices  over  an  arbitrary  ring,  with  the  algorithm  Involving  7  multipli¬ 
cations  (instead  of  the  usual  8)  and  18  additions  (instead  cf  the  usual  *0. 
Putting  Uq  ■  2,  M  ■  7  and  A  ■  18  in  the  above,  his  result  follows* 


(2.5*0 


(2.35) 


(2.56) 
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•  % 


II 

0 

0 

fl 

0 

fl 

D 

II 

0 

II 

II 

0 

9 

9 

1 

I 

1 

I 
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2.6 


Strassan's  algorithm  is  based  on  the  following  identities! 


if 


then 


and 


where 


and 


eA 

C22/ 

- 

(*11  *12 
\*21  *22j 

[bn  bia\ 

’  lb21  b22 1 

C11 

m 

ql  '  *5  ’  S  *  *T  • 

C12 

m 

%"h‘ 

C21 

m 

C22 

m 

m*2  m  %+  *5  +  *6  ’ 

*1 

m 

**11  ■  *lPb22  * 

m 

**21  ‘  a 22^11  » 

% 

m 

*22(bU  ♦  • 

m 

*11**12  +  b22^  • 

*5 

m 

(•u  ♦  .ggXkgj  •  »u)  . 

% 

m 

**11  +  “21)(bll  +  b125  » 

9 7 

m 

**12  *  *22^*b21  +  . 

\ 


>(2.37) 


j 


Strasren  in  [5]  gives  no  hint  of  how  the  identities  (2.37)  were 
discovered,  and  they  are  certainly  not  innediatcly  obvious.  1  shall  give 
a  "graphical"  method  which  makes  tho  ideas  clearer,  ond  which  enables 
one  to  rediscover  the  identities  (2*37)  in  o  few  minutes  if  they  are  not 
at  hand.  We  want  the  four  sums  of  products 

cik  *  ■n1’]* *  (i,  v.  .i,  a). 

This  might  be  represented  dlagraamatically  thus: 


*21 

21 

11 

b22 

X 

22 

12 

bu 

21 

11 

j r 

b12 

22 

12 

where  we  wont  the  four 
sums  of  products  which 
correspond  to  similarly 
labelled  squares. 


*21  *11 


*22  *12 
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2.7 


•  • 

A  produet  (aa  +  *^0^  ♦  V  ■lght  be  rep  retented  as: 


(the  signs  of  the 
terms  ere  not 
represented  in  the 
diagram) 


Now  consider  the  representetiona  of  the  seven  products  <i1# 
of  (2.37).  For  example, 

\  * 

It  le  lmediately  Obvious  frost  the  diagrams  that  we  can  combine  and 
linearly  to  give  terms  involving  the  products  a12b22*  and  anb22* 

It  is  conceivable  that  for  e  suitable  coofcination  the  a^b^  term  will 
drop  out  and  leave  c^.  If  the  reader  now  draws  the  representations  of 
q1#  ,  q^  and  sees  bow  they  combine  according  to  (2.37)  to  give 

Cjj,  ...  |  Cgg,  he  will  see  that  one  could  reconstruct  the  identities 
(2.37)  from  the  easily  remembered  graphical  representations,  apart  from 
ambiguities  in  sign.  A  little  thought  and  juggling  of  signs  will  then  give 
«  set  of  identities  equivalent  to  Strassen's  (there  may  be  «»  trivial 
pe mutation  of  the  suffices) . 

It  is  interesting  to  experiment  with  other  graphical  representations 
end  convince  oneself  that  it  is  Impossible  to  multiply  2x2  matrices  in 
less  than  seven  multiplications.  Wlnograd  [8]  claims  to  have  proved  this. 
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In  8* c.  k  we  shell  discuss  bow  to  Ijylunt  Strsssen'a  aetbod  for 


e.8 


rectangular  Matrices,  and  bow  to  avoid  any  wasteful  "bordering”  with 
zeros.  The  question  of  roundoff  arrora  will  ba  discussed  in  8ec.  3. 


* 
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3.1 


3.  Irror  Analysis 

The  most  important  ease  in  practice  is  that  of  real  matrices  and 

limited-precision  float  inf -point  computation.  I  shall  use  Wilkinson's 

notation  [6],  and  assume  all  arithsMtic  operations  are  done  in  t-diglt 

* 

rounded  binary  arithmetic  ,  except  that  some  operations  may  be  especially 
noted  to  be  done  in  double-precision  (2t-dlgit).  Wilkinson's  assumptions 
concerning  the  method  of  rounding  or  truncating  will  be  made.  Some  of 
these  assumptions,  e.g.  binary  arithmetic,  do  not  hold  for  the  IBM  360, 
and  this  will  be  discussed  later.  For  simplicity,  all  matrices  will  be 
assumed  to  be  square  (n  x  n). 


It  will  be  convenient  to  use  the  norm 


(note  that  ||XY||m  <  ||x)|M- ||y||m  is  generally  false).  This  norm  will  usually 
be  written  just  as  ||x|j  .  The  results  obtained  may  be  expressed  in  terms  of 
more  usual  matrix  norms  by  using  the  attainable  bounds 

IPIIm  <  M,  <  n.||x||H  . 

where  q  stands  for  1,  2,  •  ,  or  E. 


Wilkinson  [6]  def.nes  numbers  t^  snd  tg  which  are  slightly  less  than  t 
Wherever  t^  or  tg  appear  there  is  the  implicit  assumption  that  n.2-t  <  o.l, 
which  is  no  restriction  in  practical  cases. 


*  The  analysis  is  similar  with  any  base  0  >  2,  and  in  most  cases  the  same 
bounds  will  hold  with  2~*  replaced  by|g1*t  .  For  a  discussion  of  WinO- 
grad's  method,  and  some  further  applications  of  (2.21),  with  base  0  >  2, 
see  [12]  . 
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(3.01) 


(3.02) 


3.2 


3.1  The  jtogjl  Method 


Wilkin*  on  [6]  a  hows  that  if 


then 

He  notes  that  if 


C  -  n(A.B)  ■  A.B  +  E 
Mr  <  (2  'l).n.||A||..M 


ll»itE  «  We.||b||e 

error  in  C  may  be  high.  On  the  other  hand,  if  the  inner-product*  are 


then  the  relative 


accumulated  in  double-precision, 


-t 


*  -2t._ 

^n  «  2 


then  i|E|jE  <  2  .||AB||e  +  ^-.2  •l|AliE.||B||E  , 

and  hence  the  relative  error  in  C  will  be  low  unless  there  ia  so  ouch 


cancellation  that 


||a!!e.|1b||e  jt 

>  n 


I|ab||. 


To  get  a  bound  in  term*  of  the  norm  ||.||M,  conaider  a  typical  term 
in  the  product  C.  Such  e  term  will  be  an  inner-product 


“4v.>  - 


+  e  say. 

If  the  sum  ia  accumulated  in  the  natural  order,  we  have 
-t. 

I e |  <  2  x.(n.|x1|.|y1|  +  n.|x2|.|y2|  *  (n-l) .jx^l . Jy^l 

+  ...  +  2. |xn|.|yn|)  , 

•t  2 

so  | e |  <  2  1.  (n  .  ±  ?n-~  2).  mex|xi|.max|yi| 

As  the  x^  are  elements  cf  A,  the  elements  of  B,  (3.15)  and  the 
definition  (3.01)  give  -t  ,2  . 

(3.12)  a'nd  (3.16)  are  of  the  same  form 

!!e1|<  2  \  f(n).||A||.|lBll  , 

and  a  bound  of  this  form,  with  some  reasonable  f(n),  is  the  best  we  can 
expect  for  any  single -precis  ion  method. 


(5.U) 

(3.12) 


(3.13) 


(3.U0 

(3.15) 


(3.16) 


(3.17) 
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3.3 


For  double -precision  accumulation  of  inner-products,  the  bound 
corresponding  to  (3*13)  it 

mpf 

I!e!Ih  <  2’t.|lAB|lM  +  J  .  (n2  +  3n  -  2)  .2  <2.||A|jM.|!B||M  .  (3.18) 

Again,  unless  there  is  exceptional  cancellation,  the  relative  error  in 
C  will  be  low. 


1.2  Winograd's  Method 


First  consider  a  simple  inner-product 


P  -  fl(y  -  (5  +  *))  * 


where 


n/2 

7  "  n(  ^  ^*2J-1  +  y2j^X2J  +  y2,1-l^  * 
n/2 

5  *  fl(  £  X2j-lX2J  ' 
n/2 

T)  =  fl(  ^  y2Jy2J-l)  » 


computed  by  Winograd's  method  (n  even) . 


A  simple  example  illustrates  what  can  happen  when  limited-precision 
arithmetic  is  used.  Suppose  we  arc  using  ^-decimal  floating  arithmetic,  n  ■  2, 
X1  *  X2  B  yx  *  3^2  ■  1.000' -3  • 

Then  g  =  1.000'+6 

and  ■  1.000' -6  (both  exactly  correct), 

but  y  ■  1.00C+6  (instead  of  the  exact 

1.000002000001 '+6), 

so  p  «=  0.000  instead  of  2.000  .  The  difficulty  is  in 

forming  ^(^j-i  +  y2J^  etc*  when  the  elenien'ts  of  *  my  differ  widely  in 
magnitude  from  the  elements  of  £  .  This  conclusion  will  also  follow  from 
the  rigorous  error  analysis  below. 
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Let  a  ■  mx  jxj  end  b  ■  nx  jyj  , 


and  let 


»/2 

C  -  ^  X2J-1X2J  +  etC* 


From  (^.15)  with  n  replaced  by  n/2  we  get 
l€gl  £  2  *.a2.(n2  6n  -  8)/8  , 


<2  .a  .(n  6n  -  8)/8  ,  ^ 
and  similarly  -t.  0  «  / 

1^1  £  2  •»  •(»  +  6n  -  8)/8  .  J 

If  fl(  x  +  y)  -  x  +  y  +  (x  any  xi#  y  any  y±) 

then  le^y]  <  2*t.(|x|  +  |y|) 

<  2-t.(a  +  b)  . 

Thus  fl((x+y)(x’+y'))  -  (1  +  eQ)(x  +  y  +  e^Cx*  +  y*  ♦  «2) 

-  (x  ♦  y)(x*  +  y')  +  e,  say, 
where  JcJ  <  2-t  and  |c±  |>  h2|  <  2-t(a  +  b)  . 

By  expending  (3*25)  it  follows  that 
|e3l<2  '.3.(«  +  b)*  , 

where  t,  is  defined  by 

3 

-t,  -t  -2t  -3t 

2  5  ■  2  +  2  +2  , 

(so  in  practice  tj  act). 

-t,  9 

Hence  |eyJ  <  2  ,(3n/2).(a  +  bj  + 

2  1.((n2  +  2n  -  8)/8)(a+b)2(l+3.2  3)  . 

In  all  practical  cases 

■t.  p  "t-  -t_ 

(3n/2  +  3.2  un  +  2n  -  8)/8)).2  5  <  (3n/2).2  1  , 

and  with  this  assumption  we  get 

Kl  <  2’\((n2  +  I4n  -  8)/8).(a  +  b)2. 


(3.22) 


(3.23) 


(3.2b) 

(3.25) 


(3.26) 


(3.27) 


Ik 


3*5 


Fro*  (3.23)  and  (3.27)#  the  error  <  in  p  is  bounded  by 

1*1  <  a1  f((»*  ♦  l*H>  •  8)/8)(.  ♦  *)2  ♦  ((®2  ♦  -  8)/8)(.2  ♦  »2) 

♦  |r  -  t  -  Hi  +  |{|  +  |1li] 

(terns  of  order  2  beve  been  neglected#  but  they  ny  be  dealt  with  as 
above  (see  [12])). 

Now  |r  -  5  -  H|  <  nab  ♦  0(2-t)  ,  |{|  <  §  a2  +  |e?|,  jt||  <  §  b2  +  |«t,| 
and  a2  +  b2  <  (a  +  b)2  , 

•o  | « |  <  2-tl.  r2  -  ^  .  (a  ♦  b)2  . 

By  considering  (3*29)  with  n  replaced  by  n  -  1  and  a  term  added  for 
the  error  in  computing  and  adding  xQyn#  it  may  be  shown  that  (3*29) 
holds  whether  n  is  even  or  odd,  and  bounds  the  error  in  computing  an 
inner-product  by  Winograd's  method.  From  (3*29)  we  obtain  the  bound 

„imi  <  a  n2  *  r  - 8  ■  (i*i  ♦  IN)2 

for  matrix  multiplication  by  Winograd*  s  method.  (A  slightly  stronger 
result  than  (3*29)  can  be  Obtained  if  a  ■  b,  see  [12].) 


(3.28) 


(3.29) 


(3.210) 


Suppose  ||a)|  /  |]b||  ■  k.  (Assuming  k  f  0  or  •) 

Then 

(11*11  +  IWI)2  -  (k  +  2  +  1A)-II*I|.||B||  , 

which  shows  tint  (3*210)  will  be  much  worse  than  (3.16) 
when  k  is  very  small  or  very  large,  and  this  is  verified 
by  the  example  above. 


Scaling 

Ignoring  the  cases  ||A)j  ■  0  and  ||b||  ■ 
find  an  Integer  X  such  that  1/2  < 


0,  it  is  always  possible  to 
<  2.  Hence  a  practical 
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3.6 


# 


scheme  would  be  to  cosqpute  ||a||  and  ||b||  (In  0(n2)  opera t lone),  find  X  , 
and  than  apply  Winograd's  aatbod  to  2^A  and  2"*B  rathar  than  to  A 
and  B.  It  this  is  dooe,  than  since 

■sx  (k  +  2  ♦  1/k)  -  9/2, 

1/2CK2 

we  gat,  in  place  of  (5.210),  the  bound 

IN  <  a  +  i2»  •  8)-IWMN  . 

which  is  of  the  fora  (5*17)  and  is  not  Much  worse  than  (5*16). 

This  shows  that  Winograd’s  method  is  feasible  provided  some  fora  of 
scaling  is  used  to  mke  ]|a||  ~  )]B]{  .  Without  scaling,  the  results  My 
easily  lose  all  significance.  This  does  not  seem  to  have  been  mentioned 
by  anyone  recommending  the  use  of  Winograd's  method:  e.g.  blindly  fol¬ 
lowing  the  procedure  recommended  in  [2]  could  lead  to  disaster. 

A  more  sophisticated  fora  of  scaling  could  be  used,  but  it  is  im¬ 
portant  to  keep  the  time  for  scaling  to  a  minimum,  or  Winograd's  method 
becomes  slower  than  the  normal  method.  The  extra  time  taken  by  scaling 
will  be  considered  in  Sec.  U. 

If  it  is  easy  to  accumulate  inner-products  in  double-precision  then 
this  may  as  well  be  done.  The  error  bound  will  still  be  like  (5*211) 
though,  unless  the  terms  +  to2J,k  and  ai,2j  +  *2J-l,k  of 

are  computed  in  double-precision.  Then  we  get  a  bound 

IN  <  2  "-M  +  2  +  I2n  -  8).||a||.||b||  , 

provided  that  the  terms  x±  and  yfc  of  (2.22),  (2.23)  are  kept  in 
double-precision,  and  assuming  scaling  as  above.  (5.212)  is  very  similar 
to  (3 .18)  and  the  same  remarks  apply. 


(3*211) 


(3*212] 
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3.7 


3.3  Btwen1 1  Method 

Assuming  a  bound  ||e||  <  2’t.f(n) .||a||.||b1| 
for  n  x  n  matrices,  it  is  possible  to  deduce  a  similar  expression  for 
2n  x  2n  matrices,  if  the  multiplication  of  these  matrices  is  reduced  to 
the  multiplication  and  addition  of  n  x  n  matrices  using  Strassen's 
identities  (2.37).  This  gives  f(2n)  in  terms  of  f(n),  and  as  (3.31) 
is  certainly  true  when  n  ■  1  (with  f(l)  ■  l),  ve  can  find  f(n)  for  n 
an  integral  power  of  2.  If  the  "bordering"  method  is  used  for  general 
n  then  the  zeros  will  have  no  effect  on  the  error,  so  the  bound  for  the 
next  power  of  two  may  be  used. 


To  express  f(2n)  in  terns  of  f(n),  let  A,  B,  and  C  be  2n  x  2n 
matrices  (deviating  slightly  from  our  usual  notation),  and  regard  A,  B, 
and  C  as  2x2  matrices  with  n  x  n  blocks.  Consider  forming  C  ■ 
fl(A.B)  using  the  identities  (2.57).  Terms  of  order  2-2t  will  be 
ignored,  for  although  they  may  be  dealt  with  by  replacing  t  by  t'sart 
as  we  replaced  t  by  tj,  tg  and  t^  in  Sec.  3*2,  this  complicates  the 
argument,  and  the  results  are  not  significantly  different.  For  brevity 
let  •  -  IMItf  b  -  l|B|lH  . 

The  error  in  computing  ^  of  (2.37)  will  be  denoted  by  E^,  so  for 

example  fl((au  -  a12^22^  "  (au  -  *12^22  +  Eql  ^where  an'  *12* 
bgg  and  are  n  x  n  matrices)  •  Similarly,  the  error  in  computing 

c^  of  (2.37)  will  be  denoted  by  E^.  Thus 


C 


fl(A.B) 


A.B  +  E,  where  E  ■ 


*11  i  *12  \ 


(3.3D 


(3.32) 
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Since  q^  ■  -  alg)  .b,:  J  ,  where  the  n  x  n  matrix 

multiplication  ia  done  by  Strassen's  method  with  the  error  bound  (3*31)# 
and  the  matrix  addition  is  done  in  the  uaual  way,  we  have 

IIV'I  s  2"t(n  *  +  IM)-M  • 


oo  l|Eqlll  <  2mt.2*b.(n  +  f(n))  , 

and  similarly  for  E^2,  E^,  and  .  For  i  ■  5,  6  and  7 

(3.33) 

we  get  the  bound 

\\\i\\<  2-t.Uab.(2n+f(n)) 

(3.34) 

in  the  same  way* 

Now  it  follows  from  (2*37),  neglecting  terms  in  2~  ,  that 

IM  <  II v» +  n  ♦  a^Kil +  KID  . 

(3.35) 

but 

C  2nab  for  i  -  1,  2,  3,  4 

M  <  (  , 

1  V.  Unab  for  i  -  5,  <5,  7  , 

}(3.36) 

so  from  (3*33),  (3*35)  and  (3*36)  we  obtain 

||E12!|  <  2’t.4ab.(2n  +  f(n))  , 

(3.37) 

and  clearly  the  same  bound  holds  for  E0^.  Similarly  we  have 

IM  <  Kill +  'M +  iM' +  M + 

2‘t(Jk1ll  +  5|k3ll  +  aKii  *  kid 

(3.38) 

(assuming  q^,  q y  and  q^  are  added  in  this  order), 

so  llEjjjl  <  2"t.ab.(44n  +  12f(n))  , 
and  similarly  for  E22* 

(3.39) 

From  (3*37)  and  (3*39)  we  see  that 
||E||  <  +  ««»)). ||A||.||b||  , 

(3.310) 

so  (3*31)  will  hold  if  f  satisfies  f(  1)  ■  1  and  f(2n)  ■  44n  +  I2f(n)  * 
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(3*311) 


3.9 


By  induction  on  k,  It  follow  fro*  (5. 311)  that 

t(2*)  m  ^(27.12k  -  22. 2^)  ,  (3.312) 

so  t( 2*)  <  S.l2k  .  S.(2k)l0«212  .  (3.313) 

|r 

Hanes,  for  general  n,  taking  k  such  that  n  <  2  <  2n  , 
we  have  ||e||  <  2"t^5ne.||A||.||B|| 
where  c  ■  logg  12  ar  3.58  • 

(3.314)  gives  a  bound  for  the  error  in  matrix  multiplication  by 
Strassen's  method,  as  described  in  See.  2.3.  The  bound  is  of  the  fora 
(3.17),  although  the  function  ^n3*5®  increases  rather  more  rapidly 
than  we  would  like.  On  the  other  hand,  all  the  error  estimates  obtained 
here  are  rather  pessimistic,  for  the  individual  rounding  errors  are  un¬ 
likely  to  be  correlated  in  the  worst  possible  way.  If  our  bound  is 
2~*f(n)||A||.||B|j  then  the  actual  error  is  probably  about  2_t  Jt(n)  ||a||.||b|| 

(see  Sec.  4.6). 

The  analysis  above  assumes  that  a  "pure”  form  of  Strassen's  method 

is  used.  In  practice  it  turns  out  that  Strassen's  identities  will  be 

applied  until  the  matrices  to  be  multiplied  are  of  order  **  100  or  less, 

and  then  the  normal  method  will  be  used  (see  Sec.  4.3).  Supposing  we 

k 

have  matrices  of  order  2  .nQ,  and  apply  Strassen's  identities  k  times, 
multiplying  the  matrices  of  order  n^  by  the  normal  method.  Then  (3*3U) 
holds  with 

f(n0)  -  (njj  +  3^  -  2)/2  (from  3.16)  , 
so,  assuming  n^  >  6  ,  we  have 

f( 2%)  <  l6kn£  .  (3.315) 

Thus,  for  n  x  n  matrices,  the  bound  becomes  ||e||  <  2"t.4k.n2.||A||.||B||  .  (3.316) 


} 


(3.31*0 
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Since  k  will  be  very  Mill  in  practice,  the  bound  (3*316)  it  net 
too  bed.  Cohering  it  vlth  (3*16),  it  eppeert  that  w  nay  lea#  up  to 
tvo  bite  of  accuracy,  conpared  to  that  of  the  noraal  net  hod,  each  tine 
Stracsenb  identitiea  are  applied  recursively. 

In  using  straaeen'a  net  hod  there  doea  not  aeen  to  be  nueh  point  in 
doinc  tone  of  the  arlthnetie  in  doUble-preoition,  unless  it  oan  ell  be  done 
in  double-precision,  when  the  above  bounds  hold  with  t  replaced  by  2t 
(and  a  factor  of  3/2  with  Wilkinson's  assunptions  about  the  nethod  of 
rounding  or  truncating) . 

It  is  interesting  to  note  that  with  Straesen'a  nethod  there  Is  no 
point  in  scaling  the  Matrices  so  that  Ha|MIb|1  .  ibis  is  because,  unlike 
Winograd's  identity,  Strassen's  Identities  never  involve  the  addition  of 
an  elenent  of  A  to  an  elenent  of  B. 

3.t  Complex  Arithmetic 

The  above  analysis  is  based  on  the  assunptions  that  fl(i  ♦  y)  ■ 
x(l  ♦  tj)  ♦  y(l  ♦  c2)  and  fl(xy)  »  xy(l  ♦  c^)  where  |«t|  <  2*1, 
i  ■  1,  2,  3*.  These  assusptiens  vill  be  valid  for  complex  arlthnetie  too, 
provided  that  t  is  decreased  by  a  sasll  anount  (2  or  3)  depending  on  bov 
the  arithswtic  is  done.  Hence,  vlth  this  snail  change  in  t,  the  above 
bounds  vill  hold  for  complex  nstrix  multiplication.  Similar  renarks 
apply  to  real  arithmetic  done  on  a  decimal  or  hexadecimal  ns  china  (e.g. 
the  IBM  3<>C).  A  curious  anomaly  which  appeared  when  Winograd's  nethod 
was  being  tested  on  an  ISM  360/67  computer  is  described  In  Sec.  k.6  . 

*  A  stronger  assumption  about  addition,  used  in  Section  3.2,  was  not 
really  necessary  (see  [12]). 
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4.1 


In  order  to  cooper#  the  normal,  Wlnegrad's  end  dtnsscn's  methods 
in  practice,  they  were  ell  implemented  in  ALOOLW  (It)  on  *n  IBM  3UC/A7 
computer.  Doubtlecc  ell  three  Method*  would  run  Aster  if  coded  In, 
ssy,  FORTRAN-H  or  assembly  language,  but  their  relative  speed*  would 
probably  be  about  the  uai.  While  it  would  be  coay  enough  to  code 
the  naraal  Method  and  Winograd's  Method  In  POKTRAJI  or  assembly  language, 
for  St  ra  a  sen' a  Method  It  la  very  convenient  to  have  a  language  which 
nllows  recursive  procedure  calls.  The  sinplest  way  to  code  Straseen's 
Method  In  a  language  like  FORTRAN  would  be  to  llnlt  the  depth  of  re¬ 
cursion  and  duplicate  any  subroutines  which  would  naturally  be  called 
recursively.  The  three  Methods  were  tested  on  both  real  and  copies 
Matrices,  with  results  which  will  be  summrlzcd  below. 

All  three  Methods  were  coded  In  the  fens  of  s  pure  procedure, 
with  calling  sequence 

nsMt  (A,  B,  C,  M,  ft,  P) 

to  fora  C  :■  A.B  ,  where  A  is  an  N  x  N  nstrlx  (dimensioned  fl  ::  K, 

1  ::  N)),  B  is  N  x  P,  and  C  Is  N  x  P.  Calls  such  as  nane  (A,  A, 

A,  N,  N,  N)  are  valid,  and  correct  results  should  be  returned  for  *#ny 
14  V  and  P  >  1,  provided  enough  temporary  storage  la  available. 

At  first  tbs  procedures  were  coded  so  that  tbs  Inner  loops" Involved 
references  to  doUbly-sUbscrlpted  array  element 3.  In  ALOOLW  such  re¬ 
ferences  take  considerably  longer  then  references  to  slngly-subscripted 
array  alananta  (11],  and  it  was  found  that  all  the  procedures  could  ba 
speeded  up  by  passing  cross-sections  of  tvo-dl mens  1  one  1  arrays  as  Para¬ 
na  tors  to  procedures  which  then  operated  on  than  as  one -dimensional 
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imjWi  (This  1«  not  allowed  In  ALOOL-60.)  For  example,  instead  oft 
For  I  (■!  until  N  do 
for  J  t-  1  untU  II  do  A(X,J)  t>  B(I,J)| 

W»  UMI  / 

For  I  i>  1  until  N  do  aasl*i  (A(X,«),B(I,«),]l)l 
where  ut  hm  dtfintd 

Procedure  asalgn  (rttl  array  Ai  l(#)  I  intagtr  ralua  l) ; 
for  J  <■  1  until  II  do  h(Jl)  :■  B(J) ; 

Ttoa  ateond  font  will  axe  cute  faster  provided  I  >  10  .  Aa  thla  dttriot 
speeded  up  the  normal  method  ratbar  sort  than  Ctnasen'a  natbod,  it  la 
clear  that  a  coapariaon  of  tba  tbraa  nethods  dapanda  on  the  language 
and  tba  progmnmlng  technique  uaad  to  inplanant  than. 

Tba  laplanantion  of  each  natbod  will  now  be  described  in  aora  detail* 

Tba  procedure  for  tba  real  and  ccmpiex  oaaaa  are  vary  alallarv  and  lift- 
Inga  for  tba  real  caaa  are  given  in  tba  Appendix. 

b.l  Tba  Morael  Method 

(Procedure  MATNULT,  aaa  Appendix,  lines  ?63-311.)  There  art  no 
particular  difficulties  in  tba  iaplenentatien  of  thla  natbod.  Baoauaa 
of  tba  possibility  that  C  la  the  sane  aa  A  or  B  in  the  call,  tba  product 
la  formed  in  a  temporary  array  Q  and  then  transferred  to  C.  Thus  N.P 
words  of  temporary  storage  art  uaad.  Inner-products  are  accumulated  in 
double-precision,  for  in  ALOOUf  this  la  vary  nearly  aa  fiat  aa  accumu¬ 
lation  in  alngle-preelslon.  Hence  the  error  bounds  (3.15)  and  (3.18) 
are  applicable  (with  the  alteration  noted  in  See.  3.A),  and  in  moat  oaaaa  each 
Oy  will  be  tba  correctly  rounded  result,  although  this  oen  not  be  gunnntead. 
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id1  a  Method 


(Procedure  NXROGI RAD,  saa  Appendix,  linaa  219-265.)  A#«ln  the 
laplauentaticn  la  fairly  straight-forward.  The  eetrices  A  and  B 
•ra  aoalad  ia  daaorlbad  In  8ae.  3»2,  and  tha  aoalad  aatricaa  ara 
atorad  teeporarlly  in  array?  D  and  I.  Strictly  speaking,  sealing 
aboold  ba  dona  to  tha  naaraat  power  of  l£  rathar  than  2,  for  aoallng 
by  powara  of  2  could  Introduce  roundoff  arrora  on  tha  360,  and  theae 
arrora  have  not  boon  taken  Into  account  In  the  error  anelyala  (Sac.  3.2). 

Baking  account  of  tbaaa  an  ora  gives  the  error  bound 

mn  <s  •MB-IMp  <*•»> 

where  K  la  a  —11  constant,  lnataad  of  (3*211).  In  the  conplex  cose, 

|R(x)  |  a  |X(x)|  rather  than  |x|  waa  uaad  to  eave  tine.  This  lncraaaaa 
tha  error  bound  by  a  factor  of  at  noct  1.15  . 


The  lnnar-producta  and  yk  of  (2.22),  (2.23)  are  computed  and 
atored  in  tha  array*  X  and  I.  Aa  atated  above.  It  la  not  aigalflcantly 
harder  to  coaputa  and  aave  tha  x^  and  yk  In  doUble-preclsion,  ao  thla 
la  dooa. 

Xn  all,  (n  ♦  2)(n  a  p)  word*  of  t ■potary  storage  are  used,  which 
la  about  twice  aa  nuch  aa  for  tha  noiual  nethod  If  n  ■  n  •  p.  Via  suns 

*  "aj,^  “4  (*1,2J  *  b2J-l,k>  (S,a) 

single-precision,  and  than  tha  Inner-product  involving  thna  la  ccaputad, 
aa  usual.  In  dodble-predaloo.  It  n  la  odd  than  tha  necessasy  correction 
la  aada,  and  tha  final  result  fl(C)  Is  fomsd.  Xt  Is  Interesting  to  note 
tb.tlftb.ra.  ♦  *>8j,k)  “4  ""  “*■ 

puted  In  double -precision,  we  would  be  using  doUble-preolsloo  throughout^ 
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and  the  bound  (3«?12)  would  apply*  Unfortunately,  the  extra  tine  taken  to 
do  this  slows  the  procedure  down  so  that  it  is  never  faster  than  the 
norwal  Method,  so  the  sums  could  only  be  computed  in  single-precision, 
and  the  best  error  bound  we  can  get  is  of  the  fona  of  (fa.2l). 

fa. 3  8t reason' a  Method 

(Procedure  3TKA88EII,  see  Appendix,  lines  6-216.)  The  Method  in- 
pleManted  is  the  following!  First,  if  n,  n  and  p  are  suffieieatly  aall, 
nornal  Matrix  Multiplication  is  used  ( see  below  for  the  precise  criterion)  • 
Otherwise,  m  is  replaced  by  2|p/g  ,  n  by  2|p/3  ,  and  p  by  2 fp/j  • 

A  is  partitioned  into  four  n/2  by  n/2  Matrices  and  B  into  four  n/2  by 
p/2  Matrices,  ignoring  the  last  row  and/or  colon  if  necessary.  The 
block  2  by  2  Matrices  are  Multiplied  using  Straascn's  identities  (2*97)* 
which  Involves  seven  recursive  calls  to  8TOA«899  to  confute  the  m/2  by 
p/2  products  q^,  ...  e.,  (actually  C  is  used  in  placa  of  (ft  to  save 
storage).  Finally,  the  result  is  corrected  if  the  original  %  n  or  p 

o 

were  odd.  This  avoids  wasting  space  and  tine  by  filling  up  the  arrays 
with  seroe  as  described  in  Sec.  2*3  •  In  cans  C  coincides  with  A  or  B, 
sane  values  needed  for  the  correction  step  have  bean  saved  in  arrays  81 
and  32. 

Actually  lnplenenting  the  identities  (2.37)  is  tedious  but  straight¬ 
forward.  The  fist,  general-purpose  procedure  OP  l£  used  to  take  advantage 
of  the  fbclllty,  noted  above,  for  passing  cross-sections  of  arrays  as 
para  asters  to  procedures.  In  forcing  c^  and  c^,  the  terns  ...  q^ 
are  added  before  q^  ...  q^,  for  otherwise  the  error  bound  would  be  in¬ 
creased  slightly.  All  arlthnetlc  is  done  in  single-precision  except 
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4.5 

for  the  accumulation  of  inner-producta  when  normal  matrix  multiplication 

ia  used,  so  the  error  bound  (3*316)  is  applicable.  Because  of  the  double - 

v  2 

precision  accumulation  of  inner-products,  the  term  4  n  in  this  bound  may 
be  replaced  by  5.12*1^  . 

Procedure  IDENTITIES  uses  the  temporary  arrays  T,  U,  Ql,  $2,  ...  ,  Qb  , 
taking  (an  ♦  np  +  6pm) /U  words.  Since  the  procedure  is  called  recursively, 
at  any  one  time  we  may  need  <  (mn  ♦  np  ♦  6pm)(4’1  +  U"2  ♦  4"^  ♦  ...  ) 

■  (an  ♦  np  ♦  6pm)/3  words  of  temporary  storage.  (4.31) 
The  arrays  SI  and  82,  and  the  stack  space  required  for  recursive  proce¬ 
dure  calls,  will  be  negligible  if  a,  n  and  p  are  reasonably  large.  The 
spece  for  the  erray  Q,  used  when  normal  aatrix  multiplication  is  invoked, 
may  be  absorbed  into  (4.31).  Hence  the  temporary  storage  used  is  rough¬ 
ly  bounded  by  (4.31),  and  if  n  •  n  «  p  this  is  8n2/3  words,  or  slightly 
more  than  that  required  by  Winogrsd's  method  and  8/3  times  that  required 
by  the  normal  method.  For  all  three  methods,  the  temporary  storage  re¬ 
quirements  can  be  reduced  if  C  is  not  allowed  to  overlap  A  or  B. 

4.4  Comparison  of  the  Three  Methods 

The  three  procedures  described  above  were  run  under  the  seme  con¬ 
ditions  (idle  with  "nocheck"  option)  for  various  test  matrices  A  and  B. 

Some  running  times  for  the  case  of  square  matrices  are  given  in  Table  1. 

In  each  case  the  depth  of  recursion  in  procedure  81BA88BI  was  kept  at 
exactly  one. 
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Table  1  Running  Tinea  (in  1/60  see.) 

n  ■  n  ■  p  Real  ct«>  Connies  oim 


Woxnal 

tflnoerad 

• 

8t reason 

Boreal  Vinonrad  Stresses* 

20 

26 

3* 

b2 

53  53 

66 

30 

83 

8b 

107 

167  150 

187 

*0 

18b 

18b 

221 

38b  330 

boi 

50 

3*7 

336 

392 

731  615 

7*2 

60 

58* 

557 

636 

*8tressen'e  Mtbod  with  esaetly  ont  recursion.  Ron  tlMS  varied 
slightly,  but  vara  eonatant  to  ♦  If, 

By  counting  operationa  it  ia  dear  that  the  naming  tiaa  of  aaoh 
■athod  ahould  ba  a  cable  in  n,  and  for  Stmaaan'a  Mtbod  tba  eoaffieianta 
will  depend  on  the  depth  of  recursion*  Zt  turns  ont  that  tba  eonatant 
tern  is  negligible,  and  the  tiMs  in  fable  1  are  given  to  ♦  If  bp  etfcica 
T(n)  -  an?  *  bn2  4  on  with  the  following  coefficients: 

fable  2  Cdblc  Coeffieienta.  T  ■  an?  ♦  bn2  ♦  on,  ia  |»  aeo. 

a  b  e 


Rowel 

*0 

270 

2000 

Real 

Wlnoerad 

37 

200 

9500 

• 

Btraaaan 

36 

650 

8000 

Boreal 

90 

320 

2000 

Connies 

Mapped 

73 

220 

11500 

a 

Straasan 

80 

790 

8000 
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Bom  interesting  conclusions  my  be  drawn  frcn  Table*  1  and  2. 

Coapering  tha  noraal  Mtbod  with  Wlnograd'a  Mthod,  wa  aaa  that 
Vlnograd's  will  ba  faster  if  37 r?  ♦  200a2  ♦  9300  <  VOn3  +  270n2  ♦  2000, 
l.a.  if  n  >  Vo  in  tha  raal  case, and  if  73a?  +  220n2  ♦  11300  <  90a?  ♦ 

320n2  ♦  2000,  l.a.  if  n  >  21  in  tha  ecnplex  oaaa,  which  My  ba  verified 
by  In  (paction  of  Tabla  1.  At  n  *  •  ,  Wlnograd'a  Mthod  will  run  in 
37 Ao  ■  929  of  tha  noraal  tiaw  in  tha  raal  oaaa,  and  in  73/90  ■  819 
of  tha  noraal  tlM  in  tha  Caspian  oaaa.  Tha  gains  ara  algnifioant 
for  raaaonably  aall  n:  a.g.  f or  n  ■  100  Wlnograd'a  Mthod  will  aava 
79  (raal)  or  109  (coaplax).  Hence,  for  aodarataly  larga  Mtricas, 

Wlnograd'a  aatbod  laada  to  algnifioant,  though  not  apaetacular,  aawinga, 
and  la  worthwhile  aapaclally  in  tha  coaplax  oaaa. 

Zt  la  worth  noting  hora  that  it  doaa  not  pay  to  radnoa  tha  anltl- 
plloatlon  of  two  oo^lax  n  by  n  Mtrioaa  to  thraa  aultlplioatlona  of 
Mai  n  by  n  Mtrioaa  (plua  aoM  additlona)  by  uaing  (A  ♦  Bi)(C  ♦  Di)  ■ 

(kM) 

(I  •  !)♦  (0  •  I  -  7)1,  wbara  B  ■  AC,  F  ■  BD,  and  0  ■  (A  +  B)(C  ♦  D)  , 
for  coaplax  Mtrlx  anltlplioation  takaa  laaa  than  thraa  tiMa  aa  long 
aa  raal  Mtrlx  nultipll cation  (uaing  any  of  tha  thraa  nathoda). 

Zt  follows  froa  labia  2  that  Btraaaan'a  Mthod  will  ba  fbatar 
than  tha  noraal  Mthod  if  n  >  110  in  tha  raal  oaaa,  and  if  n  >  60  in 
tha  coaplax  oaaa.  Hanea  prooadura  8TBA88EX  should  check  to  aaa  if 
a  <  Bq  (with  n^  sat  at  110  or  60),  and  if  ao  usa  tha  noraal  aatbod. 

If  n>  n^  than  Strassan's  idantltiaa  should  ba  usad  to  raduca  n  to  n/2, 
and  tha  mm  taat  appliad  recursively.  Ihia  is  what  tha  procadura  ac¬ 
tually  doaa,  axcapt  that  nQ  la  not  coaparsd  just  with  n,  but  also  with  a 
and  p  in  oaaa  tha  Mtrioaa  ara  rectangular.  Zt  can  ba  aaan  by  counting 
operations  that  tha  appropriate  taat  is  if  3anp  <  Qq(m  ♦  np  ♦  pa)  rather 
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than  if  n  <  n^.  The  tines  given  1a  Tibia  1  wart  obtalnad  with  nQ  raduead 
io  that  Strassen's  ldantltiaa  would  ba  uaad  exactly  once. 

By  counting  operational  it  oan  easily  ba  sean  that  the  tine  Tg( n) 

for  nultiplication  of  n  by  n  as  trices  using  8t  res  sen's  nathod  should  ba 

(Ivan  by  „  «  2 

fair  ♦  bn  ♦  cn  ♦  d  if  n  <  Dq 

Tfi(n)  -  ^7xfl(n/2)  ♦  a'n2  ♦  b'n  +  o'  if  a  >  n  • 
hroa  (4.42)  it  follows  th  it,  if 

k  ■  mx(0,  [log^n/n^  «■  1)  ,  * 

then  i 

T0(n)  .  (J)"  u?  *  ((f) "  b  ♦  ^({),‘  •  !).•)»* 

+  ((J)"  *  ♦  f«£)k  •  l)b')«  I 

♦  (7^4  ♦  5(7“  -  l)e')  .  J 

The  constants  a,  b,  c  and  d  should  ba  those  given  for  the  nonaal  nathod 
In  Tibia  2  (d  ia  negligible) .  The  constants  a',  b'  and  c'  determined  to  fit 
the  «ato  in  Table  1  are: 

Tabic  3  Constants  In  (4,42)  ( u  sec.) 

Real  ease  a'  ■  190  b*  *  40CC  o'  -  12COOC 

Complex  oaaa  ?2t  4eoo  120000 

The  constants  in  Tables  2  and  3  are  not  very  veil  determined  by  the 
data  (especially  e  and  c'),  and  era  not  exactly  consistent,  for  example, 
frees  (4.42)  and  (4.4$)  we  should  have,  in  Table  2,  ng  ■  7ag/8,  while  the 
Table  gives  eg  ■  36  and  a^  »  40.  The  consistency  is  about  as  good  as  can  ba 
expected  though. 

from  (4.42)  and  (4.4$)  it  follows  that  Tg(n)  •  C(nlof2^)  asn<*«  t 
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co  for  sufficiently  large  aatrlces  Streaaen’s  method  la  arbitrarily  fa  a  tar 

9 

than  the  noraal  aothod  or  Wlnogred's  aothod.  In  practical  cases,  aay  for 
n  <  200,  tho  noraal  aothod  or  Vinogrod'  a  aothod  oppoora  to  bo  factor. 

By  tho  above  fonaulao  wo  can  oatlaoto  that  Stnaaon'a  aothod  will  bo 
factor  than  Wlnegrad's  only  if  n  >  270  (reol  coco)  or  n  >  200  (coaplex 
coco).  On  tho  other  hand,  thoao  chongoovor  pointa  are  vary  aonaltlvo 
to  changoa  in  prograaaing  techniques  etc.,  ao  it  ia  conceivable  that 
Straosan'o  aothod  would  bo  tho  fastest,  in  aoao  language  on  aoao  aa chine, 
for  aatrlcee  of  order  ~150.  In  aoet  practical  cocoa,  Wlnograd's  aothod 
will  bo  tho  faotoot,  except  that  tho  noraal  aothod  will  bo  fbator  for 
aufflclently  aaall  aatrieoa. 

4.5  Baaed  Machines 

Boat  aaehlnoa  (o.g.  tho  Burrougha  B5500)  hove  o  fairly  aaall  phyalcal 
aoaory  but  o  largo  "virtual"  aaanry.  The  uaor'a  prograa  and  data  la  divid¬ 
ed  into  "pages",  aoaa  of  which  aay  bo  hold  in  fast  core  aaanry,  and  tho 
otboro  on  a  device  such  as  a  disc  or  diva.  When  reference  is  node  to  a 
page  which  la  not  in  aaanry,  a  hardware  interrupt  occurs,  »nd  tho  required 
page  la  rood  into  aaaoiy  froa  tho  exteraol  device  (to  aake  roea  for  it,  a 
page  aay  hove  to  be  caved  on  tho  device)  •  Wo  say  that  a  "pegs  fault"  has 
occurred.  Aa  a  relatively  slow  external  derloo  is  involved,  pegs  faults 
ere  very  tlao-consualng  and  should  be  avoided  oa  aueh  os  possible,  (fbr 
o  discussion  of  tho  concepts  of  virtual  aaanry,  paging^  sepantatlon  etc. 
see  Bandell  and  Kuohner  (9l.) 

No  Keller  and  Ooffhan  [4]  have  considered  the  number  of  pegs  faults 
which  will  occur  when  certain  aatrlx  operations,  including  aultlplication, 
are  performed  on  largo  aatrlces  using  a  aa  chine  with  poging  like  that 
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described  above.  They  conclude  that,  for  a  slight  modification  of  the 

normal  method  of  matrix  multiplication,  it  ia  better  to  store  a  large 

matrix  by  submatrices,  with  each  submatrix  fitting  into  a  small  nuriber 

of  pages,  than  by  rows  or  columns.  Even  then,  the  number  of  page  faults 

3 

will  increase  like  n  for  sufficiently  large  n.  Similar  arguments  would 
apply  to  Winograd'a  method,  again  suitably  modified. 

unlike  the  normal  method  or  Winograd'a  method,  Strassen's  method 
would  perform  well,  with  eventually  0(n2*®)  page  faults,  even  when 
simple  row  or  column  storage  is  used*  This  is  because  the  only  matrix 
operations  on  matrices  with  n  >  n^  are  assignment  and  addition  operations, 
and  these  cen  be  performed  as  efficiently  when  row  or  column  storage  is 
used  as  for  any  other  method  of  storage.  A  few  modifications  to  the 
procedure  STRASSEIf  in  the  Appendix  should  be  made.  nQ  should  be  de¬ 
creased  if  necessary  so  that  iIq  by  nQ  matrices  can  be  multiplied  in 
core  (without  any  page  faults).  Also,  inner  loops  should  Involve  opera¬ 
tions  on  one  row  rather  than  on  one  column,  if  row  storage  is  used. 

Thus  we  should  change  double  loops  like 

For  J  :■  1  until  If  do  for  I  :■  1  until  M  do  ... 
to  For  I  :■  1  until  M  do  for  J  :■  1  until  If  do  ...  . 

This  also  applies  to  the  "implicit"  loops  when  procedure  OP  is  called: 
e.g.  lines  138  -  139  should  be  changed  to 
For  I  1  until  M2  do 
0P(  T(  I,  *) ,  A(  1,  *)  ,  A(  I,  *) ,  M2,  0,  If  2,  -l) ;  . 

Hence  Strassen's  method  might  be  competitive  with  the  other  methods  for 
smaller  values  of  n  on  a  paged  machine  than  on  a  machine  without  paging. 
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4,6  Rounding  Errors 


The  procedures  were  tested  using  matrices  with  elements  uniformly 
distributed  in  (-1/2*  +1/2),  or  with  real  and  Imaginary  ports  having 

O 

this  distribution.  ||E||g  and  ||e|(m  were  computed,  assuming  that  the  normal 
method  gave  exact  results,  which  is  rea soluble  considering  the  error 
bounds  (3*13)  and  (3*18).  As  expected,  the  error  bounds  (3.211)  and 
(3*316)  of  the  form  j|E||  <  2'tf(n)|jA||.||Bj|  were  too  pessimistic,  and  the 
actual  ||E||  was  more  like  \ft(n)  ||A||.||B||  :  See  Table  4. 

Table  4  llE»lL|  /  l£l  JtM  iiAii„HBii„) 


n 


Real  Strassen  Complex  Btrasaen  Complex  Wlnograd 


30 

40 

(taking  f(n) 


0.27  0.28 

0.20  0.83 


+  12n  -8)  for  Winograd, 
for  Strassen,  and  t  ■  21) 


0.28 

0.24 


A  surprising  result  occurred  with  Winograd' s  method  in  the  real  case. 
The  single-precision  results  agreed  exactly  with  those  given  by  the  normal 
method'.  This  might  be  expected  if  the  error  bound  (3.212),  rather  than 
(3*211),  were  applicable.  The  anomaly  is  apparently  caused  by  the  special 
nature  of  the  test  matrices  and  the  characteristics  of  floating-point 
arithmetic  on  the  360/67.  As  the  elements  of  A  and  B  were  uniformly 
distributed  in  ( -1/2,  +l/2) ,  about  7/8  of  them  would  have  absolute  values 
in  ( l/l6,  1/2) .  Since  the  360  i3  a  hexadecimal  machine,  any  two  such 
numbers  will  be  added  exactly.  This  means  that  at  least  49/64  of  the  sums 
(xgj-i  +  y2j)  and  (x2j  +  Ygj.^)  of  (3*21)  will  be  formed  exactly.  As 


k.12 


remarked  la  tee.  3.2,  this  Mena  that  we  ere  effectively  wing  et  leeet 
double-precieion  aoet  of  the  tlae.  Preecoebly  the  few  errora  eede  la 
coaputing  the  above  auea  were  not.  enough  to  effect  the  recoded  aingle- 
precision  reaulta,  although  it  seeea  atrenge  that  ell  the  elaaente  of 
e  50  x  50  product  abould  agree,  even  to  the  la  at  bit,  vhea  ooeputod  by 
two  auch  differeot  eethoda.  la  the  eoeplcot  caae  thie  enoealy  dieeppeere, 
for  e  rounding  error  will  uauelly  be  eede  ia  adding  either  the  reel  or 
the  leeginary  perta  of  the  above  aieaa. 


< 
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5.1 


Krai son -like  mttflii 


For  2x2  mtrix  Multiplication,  both  the  norael  nethod  tod  Krassen's 
netted  nay  be  described  ••  follows!  glroo  the  a4 ,  tod  b^,  m  fora  prod- 
uota  qj,  ...  ,  q^  of  tbt  fom 

S  "  * 

tod  t ban  tba  art  llnttr  oodblnatlons  of  tba  q^,  i.e.  thora  art 

constants  »  snob  that 

*P 


Substituting  (9*01)  in  (5.02),  equating  coefficients,  and  ualnf 
tba  dofinitloo  of  Matrix  Multiplication,  gives  tba  a  at  of  equations 


& 


^iJj^kLp’anp  "  Bnift Jk*bi  , 


where  6  la  Kronebker's  dalta.  (Tbt  subscripts  on  tba  cM  wart  reversed 

to  ineraaaa  tba  ayetry  of  (5.03).)  For  tba  Multiplication  of  N  x  I 

2 

aatrleaa  by  I  x  P  Matrices,  (5*03)  gives  (IMP)  aquatlona  aa  i,  3,  k, 

L ,  %  and  n  range  over  tba  intagara  1  <  l,n  <  M,  1  <  j,k  <  M,  1  <  L,n  <  3 
For  axaaple,  in  tba  2  x  2  oaaa  with  T  ■  7,  aa  have  6b  aquatlona  in  8b  un¬ 
knowns,  and  Straaaan'a  idantltiaa  abow  that  tbara  la  a  ablution.  8t rassen' 
solution  baa  tba  nloa  proparty  that  all  tba  or^^,  SVTy  ar*  0  or 

+1  .  Iota  that,  if  a  ablution  of  (5*03)  exists,  it  will  certainly  not 

% 

ba  unique. 


Stressed «  aotbod  applied  to  b  x  b  Matrices  shows  that  the 
aquations  (5*03)  have  an  (integral)  solution  whan  N  *  H  ■  P  ■  b, 

T  ■  b9  (there  are  b096  aquations  in  2352  unknowns).  In  general  Straaaan'a 
nathod  show*  that  there  is  a  solution  with  T  ■  7*  whan  M  •  !f  ■  P  ■  2*  . 
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(5.01) 

(5.02) 

(5.03) 


ir  there  la  a  ml  solution  with  N  •  I  »  P  and  a  carta  in  T«  than 
Matrices  of  order  n  can  b«  aultlpllad  la  C<a^%*)  arithmetic  opera tlooa 
by  a  slaple  extaaaloo  of  the  aatbod  deacrlbed  at  tba  beginning  of  Sao.  2*3* 
While  an  integral  or  rational  aolutloo  la  desirable,  la  theorya  ml  or 
*ven  a  couplex  aolutloo  would  aufflea. 


Hie  problf*  leadlnc  to  equation  (5*03)  oan  bo  gaoeraUaad  la  the 
following  way:  auppose  •••  ,  a^  and  b^  •••  ,  bj  era  aoo-ooaaut- 
in,  ™n.bl..,  ,lJk  1.  .  «l«n  th^^o.loo.1  .m,  of  r.1  or  «*U. 
numbers,  and  wa  want  to  coqpute  tba  K  auaa  of  produota  ■  J)  orijfe!Bi^j 
(k  ■  1,  ...  ,  K)  In  aa  few  aultlplloatiooa  aa  poaalbla.  than  wa  want 
tba  least  poaalbla  T  and  aoalara  alt,  aueb  that  froa  tba  T 

products 

’t ■  < ?Vi»5Vj) 

wa  can  font  the  q^  as  linear  ccObinatlooa  of  the  p^  , 
m  vktpt  *  1  -  k  -  K* 

Combining  (5.0k)  and  (5*05)  and  equating  coefficients  gives 
QitPJt7kt  "  aijk 

for  1  <  i  <  I,  1  <  J  <  J,  1  <  k  <  K, 
and  clearly  (5*05)  ia  a  special  case  of  (5*06). 


(5.0k) 


(5.0$) 


(5.06) 


TO  sharpen  the  upper  bound  (2.3 6)  for  the  constant  PQ  defined  by 
(2.35) ,  we  could  look  for  solutions  of  (5.03)  with  M  ■  H  ■  P  and 
logjjT  <  logg7  .  For  exaaple,  we  would  like  to  find  solutions  with  H  ■  2, 

T  »  6  or  H  ■  3,  T  ■  21  or  H  ■  k,  T  *  k8.  As  (  5*03)  i«  a  special  case  of 
(5«06)f  and  as  it  is  convenient  to  avoid  triple  subscripts  wherever  possible, 
we  shall  first  consider  (5*o6). 


5.5 


In  tbt  ena«  I  ■  1  it  it  net  difficult  to  show  that  the  elnlual  T 
for  which  •  aolution  of  (5.o6)  exlata  ia  tba  rank  of  tha  J  x  K  aatrlx 
(o^k)f  and  elnllarly  if  J  or  K  ■  1.  If  I,  J  and  K  arc  greater  than 
unity  than  tbaro  doaa  not  scan  to  b#  any  auch  alnple  thaorea,  and 
exaq>loa  with  I  ■  J  ■  K  ■  2  ahov  that  tha  nininal  T  nay  depend  on 
whether  the  ^  |j(  and  are  allowed  to  be  rational,  real,  or 
cceplex.  Thia  la  ao  even  if  all  the  o^  are  integral.  Hence  we 
era  lad  to  try  nonerlMl  nethoda  for  oolrlng  apeoial  casea  of  (5*06). 
If  tbeM  nethoda  find  a  real  eolation,  then  it  la  worthwhile  to  try 
to  find  en  integral  aolution,  but  if  no  real  eolation  exiata  there 
ia  no  point  in  looking  for  an  integral  aolution. 


/ 


4 

5A 


Because  of  the  lore*  ntnfeer  of  equations  (  A096  for  R  •  4), 

~on  vent  Iona  1  mas  rl  cal  aethods  Xiko  Bevten's  netted  ara  inprectloel 
for  finding  a  solution  of  (9*06).  The  prdblM  My  bo  regarded  aa  00a 
of  function  alnlxl  ration:  va  want  to  nlnlnlaa  tba  sun  of  aqparaa  of 
residuals  of  the  set  of  equations  (9*06).  It  £  and  £  ora  fixed  #  than 
($.c6)  Is  a  sat  of  linear  equations  in  the  <  Ranoo  we  ooold  find  a 
least-squares  solution  of  this  (overdetemlnod)  system  than  fix  £  # 
u  and  find  a  least  squares  solution  for  |,  than  for  and  repeat  the 
cycle.  The  sun  of  squares  of  residuals  will  converge  to  sons  non* 
negative  mariber#  and  hopefully  this  will  be  taro.  Ivan  this  Bathed 
would  be  impractical,  except  that  the  coefficient  of  In  the  spates 
of  linear  equations  happens  to  be  independent  of  i.  In  other  words# 
the  Mtrlx  of  coefficients  has  I  identical  T  x  T  blocks  along  the  Min 
diagonal#  and  seros  elsewhere#  so  each  least-squares  prCblen  splits  up 
into  a  number  of  sasller  ones. 


Writing  xt  for  C'it,  we  vent  the  least  squares  solution  of  Ax  ■  h# 

’*•"  A  *  •  <’-u> 

The  solution  is  given  by  *  m  (A^A)'^A^  (In  the  reel  ess.)  ,  ($*12) 

and  we  have 

*T*  ’  ^  5  (5.13) 

and  a\  •  (  £  Vkt’iJkU  •  ii.lk) 

As  noted  above#  (5.13)  is  independent  of  i#  but  (5.1fc)  depends  on  i. 
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convergence: 

1/  i  :•  0;  feeze  (ty  0Q  . 

2/  Find  6°  to  alniaize  1(0^  +  6a,0j) . 

3/  find  l*  to  Binlalze  *(0^  +  0°,^  d®)  • 

4/  Find  w  to  alniaize 

where  arl+1  -  0^  ♦  w6®,0i+1  -  ^  +  vd*  . 

5/  i  i  +  1  . 

6/  Oo  beck  to  2/  . 


9*6 


In  tha  • lapis  mm  of  •  qntdntle  function  s(o,|),  this  alforitte 
will  find  the  bIaIbub  1a  om  cycle. 

Tba  cim  ldM  osn  bt  uaad  in  our  aore  pneral  preblea.  Xf  *(o  ,  |,  g) 
lc  the  sub  of  squirts  of  residuals,  m  find  J°  to  Blnlalst  i(g*i°it2)  • 
Urn  6*  to  bIaIbIm  a(c  ♦  4°,l  ♦  , 

then  £  to  bIaIbIm  s(g  ♦  i°»t  ♦  i.i  *  i”) . 
then  w  to  bIaIbIm  where  «*■  0  ♦  ui°  otc. 

Since 

-  i(Ck  [C  <olt  ♦  ♦  «*k)]  . 

we  can  express  s  as  a  sixth  decree  polynomial  in  v,  and  than  v  mb  be 
chosen  to  ainlalM  this  polynomial  (globally). 


($.81) 


J8 


6.1 


6.  8— rch  for  »ew  Alaorlthns  ter  the  1— «t  toam  Hethod 

A  program  «a  written  to  try  to  find  •  solution  of  (5*03)  using 
the  least-squares  approach  described  in  8oc.  $.  Although  it  would  bo 
intoroatlng  to  look  for  coaplox  solutions,  only  the  real  east  woo 
considered. 

the  positive  definite  lynstrlc  Matrix  ATA  ia  found  free  (5.13) 
and  A*b  ia  found  free  ($.14),  taking  advantage  of  the  identity. 

*kLtt7anulnl<JklXa  *  £*JLurLiu  * 


6.1  COlculaticn  of  ate.!.*) 


Wo  shall  use  two  or  three  eUbscripts  on  the  a,  0  snd  y  ee  con¬ 
venient.  The  sun  of  squares  of  residuals  of  ($.06)  is 

2 


so 


■  °Uk]  ' 

‘(a,t,r) '  t5k  .? 

S,*  (•*» 


-  2 


♦  I 


The  straightforward  evaluation  of  (6.11)  for  mtrlx  Multiplication  with 
N  -  H  -  P  takes  ~2>6T  operations  (just  counting  Multiplications) •  Using 
(6.12)  instead,  the  last  two  tens  give  no  problems,  in  fact 


E 

i,j,k 


r 


ii 


'W/  *  «■** 


(6.01) 


(6.11) 


(6.12) 


(6.15) 
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6.2 


•**  £'1*  5  a^Jtykt 

'  i,j,x,£«,n,t  Vttt'-tWu 
*  J^^a***  . 

•nd  the  evaluation  of  (6.14)  requires  only~2*^T  operations.  The  first 
tsm  in  (6.12)  is 

t  aitPjtrkt)2  "  tEu^°it0itt^it^Jn^7ltfc7ku^  * 

snd  tbs  right  aids  of  (6.15)  involves  *yfiP/2  operations  (506  art 
smd  by  ay— try) .  since  vs  art  intsrsstsd  in  mints  of  s  con 

be  found  from  (6.12)  -  (6.1$)  inv5i7*®/2  operations  instead  of  *2*®'® 
us  inf  (6.U)  •  Hence  it  la  aucb  faster  to  use  (6.12)  -  (6.15),  although 
this  involves  a—  loss  of  accuracy. 

6.2  Quadratic  Anoroninetlon 

At  first  the  coefficients  of  v  in  the  sixth  degree  polynomial  p(v) 
of  (5*21)  were  calculated  using  8>  |>  Z/  £°»  •?  and  £7,  and  the  global 
nininun  of  p(v)  was  found.  Evaluation  of  the  coefficients  of  p(v)  was 
rather  tine-con  tuning,  and  it  was  noticed  that  the  mini—  usually  occurred 
for  1  <  w  <  2,  and  in  this  range  p(w)  was  approx  lasted  very  wall  by  the 
quadratic  fitting  p(0),  p(l)  and  p(2). 

Since  p(0)  ■  *(2tbz)  already  Known, 
and  both  p(l)  ■  s(g«-6a,g+i®»  PI*) 

and  p(2)  ■  s ( ar*26a , g+26^, 26r)  nay  be  found  by  the  method  of  Sec.  6.1, 

the  program  was  speeded  up  considerably  by  using  the  quadratic  approximation, 
and  the  rate  of  convergence  was  not  noticeably  diminished. 
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(6.14) 

(6.15) 


As  •  precaution,  naoamry  tor  tba  first  few  Iterations  anyway,  w 
was  constrained  to  lie  In  [1,  3)  .  Once  w  was  chosen,  a(ofv6Q,  J+wdf, 
was  ooaqputed  (using  previously  calculated  Inner-products  Ilka 


£  °luair) ,  and  a  check  nade  that  It  waa  lass  than  p(l)  and  p(2)  . 
Altar  the  first  few  Iterations  these  precautions  usually  turned  out  to 
be  unnecessary.  Rote  that,  once  »x  ■  E  (  u  +  x*Ju)(olv  ♦  Is 

found  for  x  ■  0,  1  and  2,  we  can  find  any  sx  fron 

•x  ■  ’  ^#o  *  *2  *  *  y)*2>#  y  -  *  "  1  • 

This  device  was  also  used  to  save  sane  tine.  There  Is  a  danger  of 
numerical  Instability  unless  J^y2  -  y)|  <  1,  l.e.  unless  0  <  x  <  3  , 
which  Is  one  reason  why  w  was  constrained  to  lie  in  [1,  31  • 


If  N  ■  R  ■  P,  the  msSber  of  operatlona  (just  counting  Multiplications) 
per  collate  cycle  Is  +T)T2/2  .  Since  N2  <  T  <  R^  for  the  cases  of 

Interest,  this  grows  very  rapidly  with  R.  On  the  other  hand,  we  are  trying 
to  aolve  R^  nonlinear  equations  In  3R2!  unknowns,  so  It  would  be  surprising 
If  any  other  Method  could  do  auch  better. 

6.3  8u—  n  of  Results 

The  attempt  to  lower  the  bound  (2.3 6)  war  unsuccessful,  but  sons 
Interesting  negative  results  were  obtained.  For  2x2  as  trices,  nany 
solutions  were  found  with  T  -  7,  but  s  never  fell  below  1  for  T  ■  6, 
strongly  Indicating  that  Stresses' s  method  gives  the  nlnlaial  nuaber  of 
multiplications  for  2x2  Matrices  (at  least  for  real  a,  £  and  2)  •  With 
T  ■  7  each  iteration  took  about  0.2  sec.  and  convergence  was  fairly  fast, 
and  appeared  to  be  linear. 

% 
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6.4 


Trying  T  ■  1,  2,  ...  7  for  2x2  matrices,  it  vac  found  that 

(7  if  T  ■  5#  6  or  7*' 

8  if  T  ■  1,  2  or  3 
7.59  if  T  -  4  . 


Thuu  the  minimal  cum  of  squares  of  residuals  is  usually  integral,  but 
appears  to  be  nonintegral  for  T  ■  4 . 


3x3  matrices  may  be  multiplied  in  26  multiplications  by  using 
Strassen's  method  on  a  2  x  2  submatrix.  Zt  appears  that  there  is  also 
a  solution  with  T  «  25:  the  program  (taking  3  sec. /iteration)  reduced 
s  to  0.183  in  33  Iterations,  snd  s  was  still  slowly  decreasing.  Knuth 
has  found  a  solution,  involving  cube  roots  of  unity,  with  T  ■  24.  How¬ 
ever,  logj24  > 

with  T  <  21  is  necessary  to  improve  the  bound  (2.36) .  When  the  program  was 
run  with  T  ■  21,  s  appeared  to  be  tending  to  2  rather  than  to  zero.  If 
the  rule  inf(s(T))  +  T  >  Tmin  ,  which  was  observed  for  the  2x2  case, 
holds  generally,  this  would  Indicate  that  for  3x3  matrices  Tit>1w  <  23  • 

For  4x4  matrices  the  program  was  run  with  T  -  48,  to  try  to  improve 
on  Strassen's  49.  Unfortunately,  each  iteration  took  18  sec.,  and  con¬ 
vergence  was  slow,  so  lack  of  computer  time  forced  a  return  to  smaller 
problems. 

Various  cases  of  small  rectangular  matrices  were  investigated.  For 
example,  the  program  was  run  with  M  ■  P  ■  2,  N  ■  4  and  with  H  ■  F  ■  4, 

N  ■  2  .  In  these  cases  the  smallest  T  for  which  s  appeared  to  be  tending 
to  zero  was  exactly  the  T  to  be  expected  by  partitioning  the  matrices  and 


loggj. 


and  in  fact 


lo^21  <  logg7  <  1°6j22, 


so  a  solution 
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applying  Straaaen'a  method.  Convergence  often  slowed  as  s  approached  1, 
and  apeeded  up  again  once  s  <  1,  and  there  was  no  case  in  which  a  <  1 
was  attained,  but  for  which  s  tailed  to  tend  to  zero.  Perhaps  s(a  /  b  2) 
has  some  local  minima  or  saddle  points,  but  they  all  have  s  >  1. 

To  summarize  the  results:  although  nothing  has  been  rigorously 
proved,  it  appears  likely  that,  to  improve  on  the  bound  (2.36),  matrices 
of  size  at  least  x  4  must  be  investigated.  It  is  plausible  that  there 
are  no  (real)  methods  better  than  Strassen's  for  the  2x2  or  3x3 
case,  and  if  this  is  so  it  ie  unlikely  that  any  new  method  could  be  of 
much  practical  use,  although  it  would  certainly  be  of  theoretical  interest. 
A  practical  method  needs  to  have  rational  a,  £  and  and  to  be  fast  for 
reasonably  small  matrices  moat  of  the  components  of  a,  £  and  2  should 
vanish. 


7.1 


7.  Conclusion 

While  the  noxml  method  tekea  0(  opexetioni  to  multiply  n  x  n 
matrices,  Strassen's  method  shows  that  0(n2*®)  suffice*  In  practice, 
though,  the  normal  method  is  faster  for  n  <  100  •  Winograd's  method, 
while  still  taking  O(n^)  operations,  trades  multiplications  for 
additions  and  is  definitely  faster  than  the  normal  method  for  moderate 
and  large  n,  with  a  gain  of  up  to  about  10%  for  real  matrices  and  up  to 
about  20%  for  complex  matrices.  The  gain  would  be  greater  for  double 
or  multiple-precision  arithmetic. 

Floating-point  error  bounds  can  be  given  for  Strassen's  and  Winograd's 
methods,  and  the  bounds  are  comparable  to  those  for  the  normal  method  if 
the  same  precision  arithmetic  is  used.  With  Winograd's  method  the  necessity 
for  prescaling  can  not'  be  emphasized  too  strongly  (see  also  [12]). 

Provided  scaling  is  used,  Winograd's  method  can  be  recommended,  es¬ 
pecially  in  the  complex  case,  unless  very  high  accuracy  is  essential.  It 
is  much  easier  to  code  than  Strassen's  method.  Possibly  Strassen's  method 
would  be  preferable  when  working  with  large  matrices  on  a  paged  machine. 

4 

Attempts  to  lower  the  constant  logg7  ■  2.8...  given  by  Strassen's 
method  were  unsuccessful.  A  completely  new  approach  seems  necessary  in 
order  to  bring  the  upper  and  lower  bounds  on  the  computational  complexity 
of  matrix  multiplication  much  closer  together.  For  matrices  of  reasonable 
size,  though,  it  seems  unlikely  that  any  new  method  could  be  very  much 
faster  than  the  known  methods  on  a  serial  computer. 


1* 
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APPENDIX 


ALOOLW  procedures  add  teat  program 


•i' 
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0001  1- 
0002  — 
0003  — 
000%  — 
0003  — 
0006  — 
0007  — 
0001  2- 
0009  — 
0010  — 
0011  — 
0012  — 
0013  — 
301%  — 
0015  — 
0016  — 
0017  — 
0016  — 
0019  — 
0020  — 
0021  — 
0022  — 
0023  -- 
002%  — 
0025  — 
0026  — 
0027  — 
0028  — 
0029  — 
0030  — 
0031  — 
0032  — 
0033  — 
003%  — 
0035  — 
0036  — 
0037  — 
0038  — 
0039  — 
00%0  — 
00%  1  — 
00%2  — 
00%3  — 
00%%  — 
00%5  — 
00%6  •• 
00%7  — 
00%  I  — 
00%9  — 
0050  — 
0051  — 
0052  — 
0053  — 
005%  — 
0055  — 
0056  3- 
0057  — 
0058  — 
0059  — 
0060  — 
0061  — 
0062  -- 
0063  -3 
006% 
0065  — 
0066  3- 


6E0IN  COMMENT t 

TE8T  PROflRAM  FOR  PROCEOUNE  8TRAS6EN#  WINOORAD  A  MATMULT, 
PILE  IS  RRENT. TESTS TRAS8EN  ON  SVSOOj 


PROCEDURE  STRASSEN  (REAL  ARRAY  A,  B,  C  (*#  •); 

INTEOER  VALUE  M,  P>; 

BEGIN  COMMENT* 

IP  A  IS  AN  M  X  N  MATRIX#  AND  B  IS  AN  N  X  P  MATRIX# 

THEN  THE  M  X  P  PRODUCT  MATRIX  A.B  IS  RETURNED  IN  C. 

A  MODIFIED  FORM  OP  STRASSEN* S  METHOD  IS  USED  WHEN 
M#  N#  AND  P  ARE  SUFFICIENTLY  LARDE.  IT  IS  BASED  ON  THE 
FOLLOWING  IDENTITIES  WHICH  HOLD  IN  THE  2X2  CASE* 

Cll  -  01  -  03  -  GS  ♦  Q7# 

C12  ■  Q%  -  Ql# 

C21  ■  02  ♦  Q3#  AND 

C22  •  03  ♦  Q8  -  02  -  Q%#  WHERE 

01  -  (All  -  A12J.B22# 

02  •  (A21  -  A22).B11# 

03  -  A22. (Rll  ♦  621)# 

0%  -  All. (812  ♦  622)# 

05  •  (All  ♦  A22).  (B22  -  BID# 

06  •  (All  ♦  A21). (Bll  ♦  612)#  AND 

07  -  (A12  ♦  A22MB21  ♦  B22) 

A#  R  AND/OR  r.  MAY  BE  IDFNTICAL  OR  OVERLAPPING  IN  THE 
CALL  TA  STRASSEN.  IN  THE  CASE  M-N»P  THE  INTERMEDIATE 
STORAGE  REOUIRED  IS  ABOUT  8N**2/3  REAL  WORDS.  THIS 
COULD  RE  REDUCED  TO  N**2  (OR  MORE  GENERALLY 
(MN  ♦  NP  ♦  PM)/S  )  BY  BUILDING  UP  THE  PRODUCT  AFTER 
EACH  CALL  TO  STRASSEN  IN  EVENMULT#  BUT  THEN  C  COULD 
NOT  OVERLAP  A  OR  B#  AND  THE  PROCEDURE  UOULD  BE 
RATHER  SLOWER. 

IF  3MIIP/(MN*NP«PM)<«N0  THEN  NORMAL  MATRIX  MULTIPLICATION 
IS  USFD.  THIS  IS  BECAUSE  STRASSEN'S  IDENTITIES  SAVE 
TIME  ONLY  IF  A  MULTIPLICATION  TAKES  LONGER  THAN  1% 
ADDITIONS#  WHICH  IS  CERTAINLY  FALSE  FOR  MATRICES  SMALLER 
THAN  1%  X  1%#  OR  A  LITTLE  LARGER.  THE  NUMBER  NO 
IS  MACHINE  AND  COMPILER-DEPENDENT#  RUT  100  IS  ABOUT 
OPTIMAL  FOR  ALGOLW  ON  THE  360/67  (WITH  NO  ARRAY  BOUNDS 
CHECKING). 

THE  TIME  FOR  PROCEDURE  STRASSEN  IS  ABOUT  THE  SAME  AS 
FOR  THE  NORMAL  METHOD  FOR  SMALL  M#  N  AND  P,  BUT  FOR 
LARGE  M#  N  AND  P  THE  TIME  MULTIPLIES  BY  7  (RATHER 
THAN  8)  EACH  TIME  M#  N  AND  P  ARE  DOUBLED.  ACCURACY 
IS  NOT  MUCH  WORSE  THAN  FOR  MATRIX  MULTIPLICATION  BY 
THE  USUAL  METHOD  WITH  ALL  OPERATIONS  DONE  IN  SINGLE 
PRECISION. 

R  BRFNT#  JULY  1969; 

REAL  PROCEDURE  IP(RFAL  ARRAY  A#  *(•);  INTEGER  VALUE  N); 

BEGIN  COMMENT t 

RETURNS  THF  INNER  PRODUCT  OF  THE  N- VECTORS  A  ANn  R; 


LONG  REAL  S; 

S  t ■  OL; 

FOR  I  *■  1  UNTIL  N  DO  S  *•  S  ♦  A(|)*B(I); 

ROUNDTOREAL(S) 

END  IP; 

PROCEDURE  OP(REAL  ARRAY  A#  B#  C(*);  INTEGER  VALUE  Ml#  M2#  M3#  F); 
BEGIN  COMMENT* 

47 


0067  — 
0011  — 
0069  •• 
0070  — 
0071  — 
0072  — 
0075  — 
0076  — 
0075  6* 
0076  5- 
0077  6- 
0071  -6 
0079  — 
0060  -S 
0011  — 
0012  5- 
0015  0- 
0086  -6 
0085  — 
0086  -5 
0087  -6 
0088  — 
0089  6- 
0090  5- 
0091  6- 
0092  -6 
0095  — 
0096  -5 
0095  — 
0096  5- 
0097  6- 
0090  -6 
0099  — 
0100  -5 
0101  -6 
0102  — 
0105  6- 
0106  5- 
0105  -5 
0106  — 
0107  -6 
0108  -5 
0109  — 
0110  — 
0111  — 
0112  — 
0115  — 
0116  — 
0115  — 
0116  — 
0117  — 
0118  5- 
0119  — 
0120  — 
0121  — 
0122  — 
0125  -5 
0126  — 
0125  — 
0126  — 
0127  5- 
0128  — 
0129  — 
0150  6- 
0151  — 
0152  — 


EFFECTIVELY  DOESt 

FOR  I  t ■  1  UNTIL  HI  DO 

AC  I >  t ■  1(1  ♦  M2)  ♦  FtC(l  ♦  NS) 

WHERE  F  ■  0,  ♦!  OR  -1. 

NOTE  THAT  IN  ALOOLW  1*0  ARRAY  ACCESSES  ARE  MUCH 
FASTER  THAN  2-0  ACCESSES) 


IF  F  >  0  THEN 
SEOIN  IF  M2  ■  0  THEN 
RESIN  IF  M3  -  0  THEN 

SERIN  FOR  I  I-  1  UNTIL  Ml  DO  AC  I >  !■  1(1)  ♦  C(l) 

END 

ELSE  FOR  I  t-  1  UNTIL  Ml  DO  AO)  t>  BO)  ♦  CO  ♦  M5) 

END 

ELSE 

SERIN  IF  MS  ■  0  THEN 

SERIN  FOR  I  *■  1  UNTIL  Ml  DO  A(l)  BO  ♦  M2)  ♦  CO) 
END 

ELSE  FOR  I  t-  1  UNTIL  Ml  DO  AO)  t*  BO  ♦  M2)  ♦  CO  ♦  MS) 
END 
END 

ELSE  IF  F  <  0  THEN 
BEQIN  IF  M2  ■  0  THEN 
BERIN  IF  MS  «  0  THEN 

BERIN  FOR  I  i-  1  UNTIL  Ml  DO  AO)  t-  BO)  •  CO) 

END 

ELSE  FOR  I  t-  1  UNTIL  Ml  DO  A(  I )  »■  BO )  -  C(  I  ♦  MS) 

END 

ELSE 

BERIN  IF  MS  ■  0  THEN 

BERIN  FOR  I  i  ■  1  UNTIL  Ml  DO  A(  I )  t-  BO  ♦Ml)  -  CO) 
END 

ELSE  FOR  I  1  UNTIL  Ml  DO  AO)  i-  B(l  ♦  M2)  -  CO  ♦  MS) 
END 
END 
ELSE 

BERIN  IF  M2  ■  0  THEN 

BERIN  FOR  I  :»  1  UNTIL  Ml  OO  AC  I )  :•  B(l) 

END 

ELSE  FOR  I  t-  1  UNTIL  Ml  OO  AO )  »•  BO  ♦  M2) 

END 

END  OP) 


COMMENT:  IF  M,  N,  OR  P  SMALL  USE  NORMAL  MATRIX  MULTIPLICATION. 

THE  CONSTANT  NO  MENTIONED  ABOVE  IS  REOUCED  TO  29  FOR 
CHECK I  NR  PURPOSES; 


IF  (5*M*N*P)  <■  (29*(M*N  ♦  N*P  ♦  P*M))  THEN 

BERIN  COMMENT:  WE  USE  A  TEMPORARY  ARRAY  0  IN  CASE  C  •  A  OR  B; 
REAL  ARRAY  R  (1  ::  M,  1  ::  P); 

FOR  I  :■  1  UNTIL  M  DO  FOR  J  :■  1  UNTIL  P  DO 
<KI,U)  :■  IP<AO,*),  RO,J),  N); 

FOR  I  :•  1  UNTIL  M  00  0P(C(l,*),  0(1,*),  Q(l,*),  P,  0,  0,  0) 
END 

ELSE 

BERIN  COMMFNT :  USE  STRASSEN'S  METHOD; 

PROCEDURE  IDENTITIES; 

BERIN  COMMENT: 

THE  IDENTITIES  ARE  PUT  HERE  TO  AVOID  SERMENT 
OVERFLOW; 
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0133  — 
013li  — 
0135  — 
0136  — 
0137  — 
013*  — 
0139  — 
0160  — 
0161  — 
0162  — 
0163  — 
0166  — 
0165  — 
0166  — 
0167  — 
0166  — 
0169  — 
0150  — 
0151  — 
0152  — 
0153  — 
0156  — 
0155  — 
0156  — 
0157  — 
015*  — 
0159  — 
0160  — 
0161  — 
0162  — 
0163  — 
0166  — 
0165  — 
0166  — 
0167  — 
016*  — 
0169  — 
0170  — 
0171  5- 
0172  — 
0173  — 
0176  — 
0175  — 
0176  -5 
0177  -6 
017*  — 
0179  — 
0180  — 
0181  — 
0182  — 
0183  — 
0186  — 
0185  — 
0186  — 
0187  6- 
0188  — 
0189  -6 
0190  — 
0191  — 
0192  6- 
0193  — 
0196  -6 
0195  — 
0196  — 
0197  — 
1198  — 


REAL  ARRAY  T  (1  it  M2,  1  tl  N2); 

REAL  ARRAY  U  (1  it  N2,  1  it  P2>| 

REAL  ARRAY  Ql,  02,  03,  06,  05,  06  (1  it  M2,  1  It  P2)j 

FOR  J  t*l  UNTIL  N2  00 

OF  (T(«,  J),  A(*,  J),  A(*,  J  ♦  N2),  M2,  0,  0,  -1); 

FOR  I  I-  1  UNTIL  N2  00 

OP  (U(l,  •),  0(1  ♦  N2,  •),  8(1,  O,  P2,  P2,  0,  0)j 
STRA88EN  (T,  U,  01,  M2,  N2,  P2>; 

FOR  I  i ■  1  UNTIL  M2  DO 

OP  (TO,  *),  A(l  ♦  M2,  •  >,  A(l  ♦  M2,  •),  N2,  0,  N2,  -1); 
STRASSEN  (T,  0,  02,  M2,  N2,  P2>> 

FOR  I  I-  1  UNTIL  M2  DO 

OP  (TO,  •),  AO  ♦  M2,  •),  AO,  *),  N2,  N2,  0,  0)| 

FOR  I  l-  1  UNTIL  N2  DO  > 

OP  (U(l,  •),  BO,  •),  BO  ♦  N2,  •),  P2,  0,  0,  1); 

STRASSEN  (T,  U,  03,  M2,  N2,  P2>; 

FOR  J  I*  1  UNTIL  P2  DO 

OP  (U(«,  J),  B(*,  J  ♦  P2>,  B(*,  J  ♦  P2),  N2,  0,  N2,  1); 
STRASSEN  (A,  U,  Q6,  M2,  N2,  P2>) 

FOR  I  t ■  1  UNTIL  M2  DO 

OP  {TO ,  •),  AO,  *),  A(l  ♦  M2,  •),  N2,  0,  N2,  1); 

FOR  I  t-  1  UNTIL  N2  00 

OP  (U(l,  •>,  0(1  ♦  N2,  *),  BO,  O,  P2,  P2,  0,  -1)| 
STRASSEN  (T,  U,  05,  M2,  N2,  P2); 

FOR  I  i-  1  UNTIL  M2  DO 

OP  (T( I ,  •),  AO,  *),  AO  ♦  M2,  •),  M2,  0,  0,  1); 

FOR  J  i«  1  UNTIL  P2  DO 

OP  (U(»,  J),  B(*,  J),  B(*,  J  ♦  P2>,  N2,  0,  0,  1); 

STRASSEN  (T,  U,  06,  M2,  M2,  P2)j 
FOR  J  1  UNTIL  N2  00 

OP  (TO,  J),  AO,  J  ♦  N2),  AO,  J  ♦  N2>,  M2,  0,  M2,  1); 

FOR  I  !•  1  UNTIL  N2  00 

OP  (U(l,  *),  R(l  ♦  N2,  •),  BO  ♦  N2,  *>,  P2,  0,  P2,  1); 
STRASSEN  (T,  U,  C  ,  M2,  N2,  P2)j 

FOR  I  | ■  1  UNTIL  M2  DO  FOR  J  l»  1  UNTIL  P2  00 
BEOIN 

C  <I,J>  i-  Old , J)  -  03  (l,J>  ♦  C  ( I, J)  -  050, J); 

C  (l,J  ♦  P?)  t-  060, J)  -  OKI, J); 

CO*  M2, J)  t>  020, J)  ♦  Q3( I, J); 

C  COM2, J*P2)  :■  050, J)  ♦  060, J)  -  (020, J)  ♦  060, J)) 
EN0 

END  IDENTITIES; 

REAL  ARRAY  Sl(l  u  P); 

REAL  ARRAY  S2(l  it  M); 

INTEGER  M2,  N2,  P2; 

M2  :•  M  DIV  2;  N2  :■  N  01V  2;  P2  :■  P  DIV  2; 

COMMENTi  THIS  PART  MUST  BE  DONE  NOW  IN  CASE  C  ■  A  OR  B; 

IF  (2*M2)  <  M  THEN 
BEOIN  FOR  J  i-  1  UNTIL  2*P2  00 
S1(J)  I-  I P(A(M, *),  B(*,J),  N) 

END; 

IF  (2*P2)  <  P  THEN 
BEOIN  FOR  I  l-l  UNTIL  M  DO 
S2(l)  l-  IP(A(I,*>,  B(*,P),  N) 

END; 

IDENTITIES; 

M2  t ■  2*M2;  N2  ;■  2*N2;  P2  : ■  2*P2; 


0199  — 
0200  — 
0201  — 
0202  — 
0209  % - 
020%  — 
0205  — 
0206  -% 
0207  — 
0201  — 
0209  %- 
0210  -% 
0211  — 
0212  — 
0213  %- 
021%  -% 
0215  -3 
0216  -2 
0217  — 
0218  — 
0219  — 
0220  2- 
0221  — 
0222  — 
0223  — 
022%  — 
0225  — 
0226  3- 
0227  -- 
0228  — 
0229  — 
0230  — 
0231  — 
0232  — 
0233  — 
023%  — 
0235  — 
0236  — 
0237  — 
0238  — 
0239  — 
0240  — 
0241  — 
02%2  -3 
0243  — 
02%%  — 
02%5  3- 
0246  — 
02%7  — 
0248  — 
0249  — 
0250  — 
0251  — 
0252  -3 
0253  — 
025%  — 
0255  — 
025C  — 
0257  — 
0258  — 
0259  — 
0260  -- 
0261  — 
0262  — 
0263  — 
026%  — 


COMMENT!  IP  ft,  N#  ON  P  MAS  ODD  ME  HAVE  TO  FIX  UP  THE  SORDENSi 

IP  N2  <  N  THEN 
REQIN 

FOR  I  !■  1  UNTIL  M2  DO  FOR  J  !■  1  UNTIL  P2  DO 
C( I # J)  !•  C(I,J)  ♦  A( I «N)*B(N/ J) 

END; 

IF  M2  <  M  THEN 

REOIN  FOR  J  !■  1  UNTIL  Pi  DO  C(M#J)  !•  S1(J) 

END; 

IF  P2  <  F  THEN 

REOIN  FOR  I  !•  1  UNTIL  M  00  C(I#P)  I-  S2(l) 

END 

END 

END  STRASSEN; 


PROCEDURE  WINOORAD  (REAL  ARRAY  A,  H,  C(*#*>;  INTEGER  VALUE  M#  N#  P); 
REOIN  COMMFNTl 

IF  A  IS  AN  M  X  N  MATRIX  AND  R  AN  N  X  P  MATRIX.  THEN 
THEIR  PRODUCT  A.R  IS  RETURNED  IN  C.  Ill  NOG  RAD'S  METHOD 
IS  USED  IIITH  PRESCALINQ  TO  ENSURE  GOOD  ACCURACY; 


REAL  PROCEDURE  WP(RFAL  ARRAY  A»  R(*>;  LONG  REAL  VALUE  X,  Y); 

BEGIN  COMMENT! 

RETURNS  THE  INNER  PRODUCT  OF  THE  N-VECTORS  A  AND  B, 
USING  PRECOMPUTED  X  AND  Y.  14  IS  GLOBAL; 


LONG  REAL  S; 

S  ! ■  -(X  ♦  Y); 

COMMENT!  IF  THE  NEXT  STATEMENT  IS  REPLACED  BVi 
FOR  I  t ■  2  STEP  2  UNTIL  2*(N  DIV  2)  DO 

S  !•  S  ♦  (LONG(A(l-l>)  ♦  LONG(B( I ) ) ) * ( LONG ( A( I ) )  ♦  LONG(B< 1-1) >>., 
THEN  THE  CORRECTLY  ROUNDED  SINGLE-PRECISION  RESULT  IS  USUALLY 
RETURNED  (ASSUMING  PRESCALING).  UNFORTUNATELY  THIS  SLOI4S  DOWN 
THE  ALGORITHM  SO  THAT  IT  IS  NO  LONGER  FASTER  THAN  THE  USUAL  ONE; 

FOR  I  i*  2  STEP  2  UNTIL  2*(N  DIV  2)  DO 
S  :■  S  *  (A{ I  -  1)  ♦  B( I >)*(A( I )  ♦  B( I  -  1>>; 

IF  (N  REM  2)  >  0  THEN  S  !■  S  ♦  A(N)*B(N); 

ROUNDTOREAL(S) 

END  UP; 

LONG  REAL  PROCEDURE  XI (REAL  ARRAY  A(O); 

BEGIN  COMMENT! 

USED  TO  PRECOMPUTE  THE  FUNCTIONS  OF  A  REQUIRED  BY  WP; 


LONG  REAL  S; 

S  !■  OL; 

FOR  I  !*  1  STEP  2  UNTIL  N  -  1  DO  S  t>  S  ♦  A(I)*A(I  ♦  1); 

S 

END  XI; 

PROCEDURE  MAX  (REAL  ARRAY  A(*9V  REAL  VAlfft  RESULT  BD); 

FOR  I  i ■  1  UNTIL  N  DO  IF  BD  <  ABS(A(  I ) )  THEN  BD  !■  ABS(A(I)); 

PROCEDURE.. UUK REAL  ARRAY  A#  B (•);  REAL  VALUE  M); 

FOR  I  !•  1  UNTIL  N  DO  A( I )  !■  M*B(|); 

REAL  AMAX.  UMAX,  MULT; 

COMMENT!  THE  ARRAYS  D  AND  E  ARE  USED  AS  TEMPORARY  STORAGE  IN  CASE 
SOME  OF  A,  B  AND  C  COINCIDF; 

REAL  ARRAY  D(1  t!  M,  1  it  N); 

REAL  ARRAY  E(1  ::  N,  1  st  P); 
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0285  — 
0288  — 
0287  — 
0261  — 
0269  — 
0270  — 
0271  — 
0272  — 
0275  — 
0274  — 
0275  — 
0276  — 
0277  — 
0271  — 
0279  — 
0210  — 
0211  — 
0282  — 
0213  — 
0214  — 
0215  -2 
0286  — 
0217  — 
0211  — 
0219  — 
0290  2- 
0291  — 
0292  — 
0293  — 
0294  3- 
0295  — 
0296  — 
0297  — 
0298  — 
0299  -- 
0300  — 
0301  -3 
0302  — 
0303  — 
0304  — 
0305  — 
0306  — 
0307  — 
0308  — 
0309  — 
0310  — 
0311  -2 
0312  — 
0313  — 
0314  — 
0315  — 
0316  — 
0317  — 
0318  2- 
0319  — 
0320  — 
0321  — 
0322  — 
0323  — 
0324  — 
0325  -2 
0326  — 
0327  — 
0328  2- 
0329  — 
0330  — 


LOHO  REAL  ARRAY  X(1  it  M)j 
LONG  REAL  ARRAY  Y(1  it  R); 

COMMENT!  A  AND  6  ARE  SCALEO  RY  SUITA6LE  ROWERS  OP  TWO  TO  GIVE  ROOD 
NUMERICAL  PROPERTIES#  AND  THE  SCALEO  MATRICES  STORED  IN 
»  ANO  E l 

AMAX  l •  8MAX  I-  0.0; 

FOR  I  t-  1  UNTIL  M  DO  MAX(A(I#*)#  AMAX); 

FOR  K  (■!  UNTIL  R  DO  MAX(8(*#K),  6MAX); 

MULT  l-  IF  (AMAX  >  0)  ANO  (RMAX  >  0)  THEN 

2**(TRUNCATE((L0ft(SMAX)  •  LOO(AMAX) )/L00(4)  ♦  200.5)  -  200) 
ELSE  1.0; 

FOR  I  i-  1  UNTIL  M  00  MULOKI,*),  A<|,0,  MULT); 

FOR  K  !■  1  UNTIL  R  00  MUL(E(*,K),  B(*,K)#  MULT); 

COMMENT!  NOW  SOME  CONSTANTS  ARE  PRECOMPUTED  ANO  SAVED  IN  X  AND  Y; 
FOR  I  t-  1  UNTIL  M  00  X(l)  f  Xl(0(l#*)); 

FOR  Ki'l  UNTIL  R  00  Y(K)  l-  XI(E(*#K)); 

COMMENT!  NOU  THE  INNER  PRODUCTS  ARE  FOUND; 

FOR  I  !■  1  UNTIL  M  00  FOR  J  !•  1  UNTIL  P  DO 
C(I#J)  l-  WP(0(l#«)#  E(*#J)#  X( I )#  Y( J) ) 

END  WINOQRAD; 


PROCEDURE  MATMULT  (PEAL  ARRAY  A#  0,  C(*#*); 

INTEOER  VALUE  M,  N#  P); 

BEGIN  COMMENT! 

FORMS  C  !■  A. 6  IN  THE  USUAL  WAY; 

REAL  PROCEOURE  IPCREAl  ARRAY  A#  BCO;  INTEOER  VALUE  N); 

BEGIN  COMMENT: 

RETURNS  THE  INNER  PRODUCT  OF  THE  N-VECTORS  A  ANO  B; 


LONG  REAL  S; 

S  !■  OL; 

FOR  I  t>  1  UNTIL  N  DO  S  !■  S  ♦  AC  I )*BC I >; 
ROUNDTOREAL(S) 

END  IP; 

PROCEDURE  ASSIGN  (REAL  ARRAY  A#  B (*);  INTFOER  VALUE  N); 
FOR  I  ! ■  1  UNTIL  N  DO  A(l)  t>  B(t); 

COMMENT!  Q  IS  USER  IN  CASE  C  COINCIDES  WITH  A  OR  B; 
REAL  ARRAY  Q(1  : :  M#  1st  P); 

FOR  I  !•  1  UNTIL  M  00  FOR  J  :■  1  UNTIL  P  00 
Q(I,J)  !  ■  I  P(A(  I  #*)#  IM*,J),  N); 

FOR  I  t ■  1  UNTIL  M  00  ASSIGN  (C(l,*),  0(1#*)#  P) 

END  MATMULT; 


INTEGER  RANI#  RAN2#  RAN3#  RAN4; 

INTEGER  ARRAY  RAN 5  (0  ::  255); 

PROCEOURE  RANINIT  (INTEGER  VALUE  Rl); 

BEGIN  COMMENT : 

MUST  BE  CALLED  WITH  ANY  INTEGER  Rl 
TO  INITIALIZE  PROCEDURE  RANDOM; 

INTOVFL  :■  NULL;  COMMENT:  MASKS  OFF  INTEGER  OVERFLOW; 

RANI  :-  1;  RAN 2  :•  2*ABS  (Rl)  ♦  1; 

FOR  I  :■  0  UNTIL  255  00  RAN5  Cl)  !•  RAN2  :>  RAN2*65539 
END  RANINIT; 

REAL  PROCEDURE  RANDOM; 

BEGIN  COMMENT: 

USES  TV)  SIMPLE  LFHMER  GENERATORS  OF  THE  FORM 
X(N*1)  -  X(N)*A  (MOD  T)  WITH 
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II 


fi 

0 


n 

0 

o 

ii 

o 

o 

D 

II 


1 


I 

3 

I 

1 


0331  — 
0332  — 
0333  — 
033k  — 
0333  — 
0336  — 
0337  — 
0338  — 
0339  — 
0340  — 
0341  — 
0342  — 
0343  — 
0344  — 
0345  — 
0346  — 
0347  — 
0341  — 
0349  — 
0350  -2 
0351  — 
0352  — 
0353  — 
0354  — 
0355  — 
0356  — 
0357  -- 
0358  — 
0359  — 
0360  — 
0361  — 
0362  — 
0363  2- 
0364  — 
0365  — 
0366  3- 
0367  — 
0368  — 
0369  — 
0370  -- 
0371  — 
0372  — 
0373  — 
0374  — 
0375  — 
0376  — 
0377  — 
0378  — 
0379  — 
0380  — 
0381  — 
0382  4- 
0383  — 
0384  — 
0385  — 
0386  — 
0387  — 
0388  -4 
0389  — 
0390  — 
0391  — 
0392  -3 
0393  -2 
0394  -1 


A1  -  (HOD  Tl)  ■  6433#  T1  •  2»*13-1  •  8191# 

A2  ■  2**16*3  >  65539#  T2  >  2**31  ■  2147413643. 

THE  FIRST  GENERATOR  JUST  POINT!  TO  THE  TABLE  OF 
ENTRIES  FOR  THE  SECOND  GENERATOR#  SO  0000  RANDOM 
NUMBERS  WITH  A  CYCLE  LENGTH  AT  LEAST  2.10**12  ARE 
PROOUCEO. 

THE  IDEA  IS  DUE  TO  MACLAREN  AND  MARSAQLIA#  SEE 
KNUTH#  VOL  2#  PO  30#  ALGORITHM  M. 

REAL  OUTPUT  UNIFORM  IN  (0#1). 

NOTE  THAT  INTEGER  RANI#  RAN2#  RAN 3#  RAN4  AND 
INTEGER  ARRAY  RAN5  <0 1 1 255 )  MUST  BE  DECLARED 
GLOBALLY  AND  RANI  NIT  MUST  RE  CALLED  FOR 
INITIALIZATION; 

RANI  >•  (RAN1*643S)  REM  8191; 

RAN 3  RANI  REM  256; 

RAN4  i-  RAN 5  (RAN3); 

RAN 2  i-  RAN 5  (RAN3)  I-  RAN2  •  65539; 

RAN 4  •  0.465661287'-9 
END  RANDOM; 


PROCEDURE  RANSET  (REAL  ARRAY  A(*#*>;  INTEGER  VALUE  M#  N); 
FOR  I  t>  1  UNTIL  M  DO  FOR  J  |«  1  UNTIL  N  00 
A( I# J)  I ■  RANROM  -  0.5; 


COMMENT:  CALLING  PROGRAM; 

INTEGER  R#  M#  N#  P#  T;  REAL  S#  MAX#  PEL#  SW#  MAXH; 

READ(R); 

WHILE  R  0  PO 

BEGIN  REAPON(M#  M#  P);  RANINIT(R);  WRITEC  n);  WRITEC"  "); 
WRITECR"#  R#  6  M'»#  II#  "  H"#  N# 

••  p..#  p)j 

BEGIN 

REAL  ARRAY  A(1  ::  M#  1  ::  N>; 

REAL  ARRAY  B(1  ::  N#  1  it  P); 

REAL  ARRAY  C#  D#  E(1  ::  M#  1  it  P); 

RANSET  (A#  M#  H)j 
RANSET  (B#  N#  P); 

T  :■  TIME(l); 

MATMULT(A#  B#  D#  M#  N#  P); 

WRITE  ("MATMULT  TIME  "#  TIME<1)  -  T); 

T  :■  TIME(l); 

STRASSEN  (A#  B#  C#  M#  N#  P); 

WRITE  ("STRASSEN  TIME"#  TIMF(l)  -  T);  T  :■  TIME(l); 
WINOGRADtA#  R#  E#  H#  It#  P); 

WR I TE ( "W I NOGRAP  TIME",  TIME(l)  -  T); 

S  :■  MAX  :■  SW  :■  MAXW  :■  0; 

FOR  I  :■  1  UNTIL  M  DO  FOR  J  :■  1  UNTIL  P  PO 
BEGIN  DEL  ABS(C(|#J)  -  B(|#J)); 

IF  MAX  <  DEL  THEN  MAX  :■  DEL; 

S  :■  S  ♦  DEL*DEL; 

DEL  :•  ABS(D( I # J)  -  E(I#J>); 

IF  MAXW  <  DEL  THEN  MAXW  :■  DEL; 

SW  :■  SW  ♦  DEL*DEL 
END; 

WR  I  TEC'S  ",  S  #  "  MAX  "#  MAX  >; 

WRITEC'SW",  SW,  "  MAXW",  MAXW); 

REAO(R) 

END 

END 

END. 
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— *-v  Straasen's  and  Winograd's  algorithms  for  matrix  multiplication  are  investi¬ 
gated  and  compared  with  the  normal  algorithm.  Floating  -  point  error  bounds  are 
obtained,  and  it  is  shown  that  scaling  is  essential  for  numerical  accuracy  using 
Winograd's  method.  In  practical  cases  Winograd's  method  appears  to  be  slightly 
faster  than  the  other  two  methods,  but  the  gain  is,  at  most,  about  20$.  Finally, 
an  attempt  to  generalize  Strassen's  method  is  described. 


